Discussion:
RADIUS Monitoring tool
Clement Ogedengbe
2015-02-25 13:28:50 UTC
Permalink
On two occasions in the last 2 weeks, our RADIUS server suddenly started to reject ALL users. Even though we have set up a failover system. Unfotunately, the fail-over system did not kick in because the RADIUS service was still running, only that it was rejecting all users for some strange reasons.

Does anyone know of any monitoring script/tool that can be used to test that the RADIUS server is authenticating properly and which can send an alert by email or text in the event that the server rejects authentication of a valid user credentials a number of times.

Best Regards

Clement Ogedengbe


-----Original Message-----
From: Freeradius-Users [mailto:freeradius-users-bounces+c.ogedengbe=***@lists.freeradius.org] On Behalf Of Tevfik Ceydeliler
Sent: 25 February 2015 12:44
To: freeradius-***@lists.freeradius.org
Subject: Re: GGSN/APN Freeradius and Proxy
"Tue Feb 24 11:40:21 2015: sending reject for vantacgida4's query from 10.43.1.51"
It not help me to understand why reject.
So… ask the Kobil people why their RADIUS server is broken.
No support we paid :(
So… use FreeRADIUS to edit the proxied packet, so that it looks more like the one from radtest. That’s what the “pre-proxy” section is for. There are lots of examples and documentation for this.
rad_recv: Access-Request packet from host 172.30.80.1 port 24208,
Calling-Station-Id = "905344776557"
User-Name = "vantacgida4”
...
User-Password = "5080+00526417”
Does that name / password work for radtest? If not, then stop
wasting your time, and throw the home server in the garbage. Get one
that works,
I cant test this use bec. It is reseller. But I create another user and can test it.
########################################################
***@radiuspnb:/etc/freeradius# radtest kivanccepel 475224928708
10.1.1.51 10 geheim
Sending Access-Request of id 21 to 10.1.1.51 port 1812
User-Name = "kivanccepel"
User-Password = "475224928708"
NAS-IP-Address = 127.0.1.1
NAS-Port = 10
Message-Authenticator = 0x00000000000000000000000000000000
rad_recv: Access-Accept packet from host 10.1.1.51 port 1812, id=21,
length=2
###########################################################
As you see it works.
But from GGSN not work.
I really wish to throw taht home-serve rto garbage. But more that 300 reseller connect via this home-server.

OK lets change home server. I have another one for internal usage.
In this case,
##################################################################33
ad_recv: Access-Request packet from host 172.30.80.1 port 24144, id=10,
length=377
Calling-Station-Id = "905303630245"
User-Name = "biryudumgida3"
NAS-IP-Address = 172.30.80.1
NAS-Identifier = "MTCGGSNK3"
Service-Type = Framed-User
Framed-Protocol = GPRS-PDP-Context
NAS-Port-Type = Wireless-Other
3GPP-IMSI = "286015918760926"
3GPP-IMSI-MCC-MNC = "28601"
3GPP-NSAPI = "5"
3GPP-Selection-Mode = "0"
3GPP-Charging-ID = 50711443
3GPP-GPRS-Negotiated-QoS-profile = "05-13921F7396F7FE74620846006400"
3GPP-Charging-Characteristics = "0800"
Called-Station-Id = "yasarapn"
3GPP-SGSN-Address = 86.108.153.116
3GPP-SGSN-MCC-MNC = "28601"
3GPP-GGSN-Address = 86.108.153.126
3GPP-GGSN-MCC-MNC = "28601"
3GPP-Negotiated-DSCP = 18
3GPP-RAT-Type = 1
3GPP-Location-Info = 0x0182f610eb2acd62
3GPP-Attr-23 = 0x8020
3GPP-IMEISV = "9800670040325323"
3GPP-PDP-Type = 0
NAS-Port = 41524
User-Password = "645327067460"
3GPP-Charging-Gateway-Address = 10.200.211.27 # Executing section authorize from file /etc/freeradius/sites-enabled/default
+- entering group authorize {...}
++[preprocess] returns ok
++[chap] returns noop
++[mschap] returns noop
++[digest] returns noop
[suffix] No '@' in User-Name = "biryudumgida3", looking up realm NULL [suffix] No such realm "NULL"
++[suffix] returns noop
[eap] No EAP-Message, not doing EAP
++[eap] returns noop
++[files] returns noop
[sql] expand: %{User-Name} -> biryudumgida3
[sql] sql_set_user escaped user --> 'biryudumgida3'
rlm_sql (sql): Reserving sql socket id: 4
[sql] expand: SELECT id, username, attribute, value, op
FROM radcheck WHERE username = '%{SQL-User-Name}'
ORDER BY id -> SELECT id, username, attribute, value, op FROM
radcheck WHERE username = 'biryudumgida3' ORDER BY id
[sql] User found in radcheck table
[sql] expand: SELECT id, username, attribute, value, op
FROM radreply WHERE username = '%{SQL-User-Name}'
ORDER BY id -> SELECT id, username, attribute, value, op FROM
radreply WHERE username = 'biryudumgida3' ORDER BY id
[sql] expand: SELECT groupname FROM radusergroup
WHERE username = '%{SQL-User-Name}' ORDER BY priority -> SELECT
groupname FROM radusergroup WHERE username =
'biryudumgida3' ORDER BY priority
[sql] expand: SELECT id, groupname, attribute, Value,
op FROM radgroupcheck WHERE groupname =
'%{Sql-Group}' ORDER BY id -> SELECT id, groupname,
attribute, Value, op FROM radgroupcheck
WHERE groupname = 'UGR_TcellOtonomYBB-Secovid' ORDER BY id
[sql] User found in group UGR_TcellOtonomYBB-Secovid
[sql] expand: SELECT id, groupname, attribute, value,
op FROM radgroupreply WHERE groupname =
'%{Sql-Group}' ORDER BY id -> SELECT id, groupname,
attribute, value, op FROM radgroupreply
WHERE groupname = 'UGR_TcellOtonomYBB-Secovid' ORDER BY id
rlm_sql (sql): Released sql socket id: 4
++[sql] returns ok
++[expiration] returns noop
++[logintime] returns noop
++[pap] returns noop
WARNING: Empty pre-proxy section. Using default return values.
Sending Access-Request of id 80 to 10.1.1.51 port 1812
Calling-Station-Id = "905303630245"
User-Name = "biryudumgida3"
NAS-IP-Address = 172.30.80.1
NAS-Identifier = "MTCGGSNK3"
Service-Type = Framed-User
Framed-Protocol = GPRS-PDP-Context
NAS-Port-Type = Wireless-Other
3GPP-IMSI = "286015918760926"
3GPP-IMSI-MCC-MNC = "28601"
3GPP-NSAPI = "5"
3GPP-Selection-Mode = "0"
3GPP-Charging-ID = 50711443
3GPP-GPRS-Negotiated-QoS-profile = "05-13921F7396F7FE74620846006400"
3GPP-Charging-Characteristics = "0800"
Called-Station-Id = "yasarapn"
3GPP-SGSN-Address = 86.108.153.116
3GPP-SGSN-MCC-MNC = "28601"
3GPP-GGSN-Address = 86.108.153.126
3GPP-GGSN-MCC-MNC = "28601"
3GPP-Negotiated-DSCP = 18
3GPP-RAT-Type = 1
3GPP-Location-Info = 0x0182f610eb2acd62
3GPP-Attr-23 = 0x8020
3GPP-IMEISV = "9800670040325323"
3GPP-PDP-Type = 0
NAS-Port = 41524
User-Password = "645327067460"
3GPP-Charging-Gateway-Address = 10.200.211.27
Proxy-State = 0x3130
Proxying request 4 to home server 10.1.1.51 port 1812 Sending Access-Request of id 80 to 10.1.1.51 port 1812
Calling-Station-Id = "905303630245"
User-Name = "biryudumgida3"
NAS-IP-Address = 172.30.80.1
NAS-Identifier = "MTCGGSNK3"
Service-Type = Framed-User
Framed-Protocol = GPRS-PDP-Context
NAS-Port-Type = Wireless-Other
3GPP-IMSI = "286015918760926"
3GPP-IMSI-MCC-MNC = "28601"
3GPP-NSAPI = "5"
3GPP-Selection-Mode = "0"
3GPP-Charging-ID = 50711443
3GPP-GPRS-Negotiated-QoS-profile = "05-13921F7396F7FE74620846006400"
3GPP-Charging-Characteristics = "0800"
Called-Station-Id = "yasarapn"
3GPP-SGSN-Address = 86.108.153.116
3GPP-SGSN-MCC-MNC = "28601"
3GPP-GGSN-Address = 86.108.153.126
3GPP-GGSN-MCC-MNC = "28601"
3GPP-Negotiated-DSCP = 18
3GPP-RAT-Type = 1
3GPP-Location-Info = 0x0182f610eb2acd62
3GPP-Attr-23 = 0x8020
3GPP-IMEISV = "9800670040325323"
3GPP-PDP-Type = 0
NAS-Port = 41524
User-Password = "645327067460"
3GPP-Charging-Gateway-Address = 10.200.211.27
Proxy-State = 0x3130
Going to the next request
Waking up in 0.9 seconds.
rad_recv: Access-Accept packet from host 10.1.1.51 port 1812, id=80,
length=24
Proxy-State = 0x3130
# Executing section post-proxy from file /etc/freeradius/sites-enabled/default
+- entering group post-proxy {...}
[eap] No pre-existing handler found
++[eap] returns noop
Found Auth-Type = Accept
Auth-Type = Accept, accepting the user
# Executing section post-auth from file /etc/freeradius/sites-enabled/default
+- entering group post-auth {...}
rlm_sql (sql): Reserving sql socket id: 3
[sqlippool] expand: %{User-Name} -> biryudumgida3
[sqlippool] sql_set_user escaped user --> 'biryudumgida3'
[sqlippool] expand: START TRANSACTION -> START TRANSACTION
[sqlippool] expand: UPDATE radippool SET nasipaddress = '',
pool_key = 0, callingstationid = '', username = '', expiry_time =
NULL WHERE expiry_time <= NOW() - INTERVAL 1 SECOND AND nasipaddress
= '%{Nas-IP-Address}' -> UPDATE radippool SET nasipaddress = '',
pool_key = 0, callingstationid = '', username = '', expiry_time =
NULL WHERE expiry_time <= NOW() - INTERVAL 1 SECOND AND nasipaddress
= '172.30.80.1'
[sqlippool] expand: SELECT framedipaddress FROM radippool WHERE
pool_name = '%{control:Pool-Name}' AND (expiry_time < NOW() OR expiry_time IS NULL) ORDER BY (username <> '%{User-Name}'), (callingstationid <> '%{Calling-Station-Id}'), expiry_time LIMIT 1 FOR UPDATE -> SELECT framedipaddress FROM radippool WHERE pool_name = 'IP_TcellOtonomYBB' AND (expiry_time < NOW() OR expiry_time IS NULL) ORDER BY (username <> 'biryudumgida3'), (callingstationid <> '905303630245'), expiry_time LIMIT 1 FOR UPDATE
[sqlippool] expand: UPDATE radippool SET nasipaddress =
'%{NAS-IP-Address}', pool_key = '%{NAS-Port}', callingstationid = '%{Calling-Station-Id}', username = '%{User-Name}', expiry_time = NOW()
+ INTERVAL 21600 SECOND WHERE framedipaddress = '172.30.64.190' AND
expiry_time IS NULL -> UPDATE radippool SET nasipaddress = '172.30.80.1', pool_key = '41524', callingstationid = '905303630245', username = 'biryudumgida3', expiry_time = NOW() + INTERVAL 21600 SECOND WHERE framedipaddress = '172.30.64.190' AND expiry_time IS NULL [sqlippool] Allocated IP 172.30.64.190 [be401eac]
[sqlippool] expand: COMMIT -> COMMIT
rlm_sql (sql): Released sql socket id: 3
[sqlippool] expand: Allocated IP: %{reply:Framed-IP-Address} from
%{control:Pool-Name} (did %{Called-Station-Id} cli
%{Calling-Station-Id} port %{NAS-Port} user %{User-Name}) -> Allocated
IP: 172.30.64.190 from IP_TcellOtonomYBB (did yasarapn cli
905303630245 port 41524 user biryudumgida3)
Allocated IP: 172.30.64.190 from IP_TcellOtonomYBB (did yasarapn cli
905303630245 port 41524 user biryudumgida3)
++[sqlippool] returns ok
++[exec] returns noop
Sending Access-Accept of id 10 to 172.30.80.1 port 24144
Framed-IP-Address = 172.30.64.190
Finished request 4.
Going to the next request
Waking up in 4.9 seconds.
Cleaning up request 4 ID 10 with timestamp +133 Ready to process requests.
rad_recv: Access-Request packet from host 172.30.80.1 port 24144, id=10,
length=377
Calling-Station-Id = "905303630245"
User-Name = "biryudumgida3"
NAS-IP-Address = 172.30.80.1
NAS-Identifier = "MTCGGSNK3"
Service-Type = Framed-User
Framed-Protocol = GPRS-PDP-Context
NAS-Port-Type = Wireless-Other
3GPP-IMSI = "286015918760926"
3GPP-IMSI-MCC-MNC = "28601"
3GPP-NSAPI = "5"
3GPP-Selection-Mode = "0"
3GPP-Charging-ID = 50711443
3GPP-GPRS-Negotiated-QoS-profile = "05-13921F7396F7FE74620846006400"
3GPP-Charging-Characteristics = "0800"
Called-Station-Id = "yasarapn"
3GPP-SGSN-Address = 86.108.153.116
3GPP-SGSN-MCC-MNC = "28601"
3GPP-GGSN-Address = 86.108.153.126
3GPP-GGSN-MCC-MNC = "28601"
3GPP-Negotiated-DSCP = 18
3GPP-RAT-Type = 1
3GPP-Location-Info = 0x0182f610eb2acd62
3GPP-Attr-23 = 0x8020
3GPP-IMEISV = "9800670040325323"
3GPP-PDP-Type = 0
NAS-Port = 41524
User-Password = "645327067460"
3GPP-Charging-Gateway-Address = 10.200.211.27 # Executing section authorize from file /etc/freeradius/sites-enabled/default
+- entering group authorize {...}
++[preprocess] returns ok
++[chap] returns noop
++[mschap] returns noop
++[digest] returns noop
[suffix] No '@' in User-Name = "biryudumgida3", looking up realm NULL [suffix] No such realm "NULL"
++[suffix] returns noop
[eap] No EAP-Message, not doing EAP
++[eap] returns noop
++[files] returns noop
[sql] expand: %{User-Name} -> biryudumgida3
[sql] sql_set_user escaped user --> 'biryudumgida3'
rlm_sql (sql): Reserving sql socket id: 2
[sql] expand: SELECT id, username, attribute, value, op
FROM radcheck WHERE username = '%{SQL-User-Name}'
ORDER BY id -> SELECT id, username, attribute, value, op FROM
radcheck WHERE username = 'biryudumgida3' ORDER BY id
[sql] User found in radcheck table
[sql] expand: SELECT id, username, attribute, value, op
FROM radreply WHERE username = '%{SQL-User-Name}'
ORDER BY id -> SELECT id, username, attribute, value, op FROM
radreply WHERE username = 'biryudumgida3' ORDER BY id
[sql] expand: SELECT groupname FROM radusergroup
WHERE username = '%{SQL-User-Name}' ORDER BY priority -> SELECT
groupname FROM radusergroup WHERE username =
'biryudumgida3' ORDER BY priority
[sql] expand: SELECT id, groupname, attribute, Value,
op FROM radgroupcheck WHERE groupname =
'%{Sql-Group}' ORDER BY id -> SELECT id, groupname,
attribute, Value, op FROM radgroupcheck
WHERE groupname = 'UGR_TcellOtonomYBB-Secovid' ORDER BY id
[sql] User found in group UGR_TcellOtonomYBB-Secovid
[sql] expand: SELECT id, groupname, attribute, value,
op FROM radgroupreply WHERE groupname =
'%{Sql-Group}' ORDER BY id -> SELECT id, groupname,
attribute, value, op FROM radgroupreply
WHERE groupname = 'UGR_TcellOtonomYBB-Secovid' ORDER BY id
rlm_sql (sql): Released sql socket id: 2
++[sql] returns ok
++[expiration] returns noop
++[logintime] returns noop
++[pap] returns noop
WARNING: Empty pre-proxy section. Using default return values.
Sending Access-Request of id 101 to 10.1.1.51 port 1812
Calling-Station-Id = "905303630245"
User-Name = "biryudumgida3"
NAS-IP-Address = 172.30.80.1
NAS-Identifier = "MTCGGSNK3"
Service-Type = Framed-User
Framed-Protocol = GPRS-PDP-Context
NAS-Port-Type = Wireless-Other
3GPP-IMSI = "286015918760926"
3GPP-IMSI-MCC-MNC = "28601"
3GPP-NSAPI = "5"
3GPP-Selection-Mode = "0"
3GPP-Charging-ID = 50711443
3GPP-GPRS-Negotiated-QoS-profile = "05-13921F7396F7FE74620846006400"
3GPP-Charging-Characteristics = "0800"
Called-Station-Id = "yasarapn"
3GPP-SGSN-Address = 86.108.153.116
3GPP-SGSN-MCC-MNC = "28601"
3GPP-GGSN-Address = 86.108.153.126
3GPP-GGSN-MCC-MNC = "28601"
3GPP-Negotiated-DSCP = 18
3GPP-RAT-Type = 1
3GPP-Location-Info = 0x0182f610eb2acd62
3GPP-Attr-23 = 0x8020
3GPP-IMEISV = "9800670040325323"
3GPP-PDP-Type = 0
NAS-Port = 41524
User-Password = "645327067460"
3GPP-Charging-Gateway-Address = 10.200.211.27
Proxy-State = 0x3130
Proxying request 5 to home server 10.1.1.51 port 1812 Sending Access-Request of id 101 to 10.1.1.51 port 1812
Calling-Station-Id = "905303630245"
User-Name = "biryudumgida3"
NAS-IP-Address = 172.30.80.1
NAS-Identifier = "MTCGGSNK3"
Service-Type = Framed-User
Framed-Protocol = GPRS-PDP-Context
NAS-Port-Type = Wireless-Other
3GPP-IMSI = "286015918760926"
3GPP-IMSI-MCC-MNC = "28601"
3GPP-NSAPI = "5"
3GPP-Selection-Mode = "0"
3GPP-Charging-ID = 50711443
3GPP-GPRS-Negotiated-QoS-profile = "05-13921F7396F7FE74620846006400"
3GPP-Charging-Characteristics = "0800"
Called-Station-Id = "yasarapn"
3GPP-SGSN-Address = 86.108.153.116
3GPP-SGSN-MCC-MNC = "28601"
3GPP-GGSN-Address = 86.108.153.126
3GPP-GGSN-MCC-MNC = "28601"
3GPP-Negotiated-DSCP = 18
3GPP-RAT-Type = 1
3GPP-Location-Info = 0x0182f610eb2acd62
3GPP-Attr-23 = 0x8020
3GPP-IMEISV = "9800670040325323"
3GPP-PDP-Type = 0
NAS-Port = 41524
User-Password = "645327067460"
3GPP-Charging-Gateway-Address = 10.200.211.27
Proxy-State = 0x3130
Going to the next request
Waking up in 0.9 seconds.
rad_recv: Access-Reject packet from host 10.1.1.51 port 1812, id=101,
length=24
Proxy-State = 0x3130
# Executing section post-proxy from file /etc/freeradius/sites-enabled/default
+- entering group post-proxy {...}
[eap] No pre-existing handler found
++[eap] returns noop
Using Post-Auth-Type Reject
# Executing group from file /etc/freeradius/sites-enabled/default
+- entering group REJECT {...}
[attr_filter.access_reject] expand: %{User-Name} -> biryudumgida3
attr_filter: Matched entry DEFAULT at line 11
++[attr_filter.access_reject] returns updated
Delaying reject of request 5 for 1 seconds Going to the next request Waking up in 0.9 seconds.
Sending delayed reject for request 5
Sending Access-Reject of id 10 to 172.30.80.1 port 24144 Waking up in 4.9 seconds.
Cleaning up request 5 ID 10 with timestamp +143 Ready to process requests.

####################################################3
User somes from GGSN
SQL detect username and IP pool and profile Freeradius receive Access-Accept message from home server:

rad_recv: Access-Accept packet from host 10.1.1.51 port 1812, id=80,
length=24
Proxy-State = 0x3130
# Executing section post-proxy from file /etc/freeradius/sites-enabled/default
+- entering group post-proxy {...}
[eap] No pre-existing handler found
++[eap] returns noop
Found Auth-Type = Accept
Auth-Type = Accept, accepting the user
# Executing section post-auth from file /etc/freeradius/sites-enabled/default
+- entering group post-auth {...}
rlm_sql (sql): Reserving sql socket id: 3
[sqlippool] expand: %{User-Name} -> biryudumgida3
[sqlippool] sql_set_user escaped user --> 'biryudumgida3'
[sqlippool] expand: START TRANSACTION -> START TRANSACTION

Then again SQL query
Again and again.
really dont know why happen
Alan DeKok.
-
List info/subscribe/unsubscribe? See
http://www.freeradius.org/list/users.html
--
<br>
<img src="Loading Image..."> </img> <br><br> Bu elektronik postada bulunan tum fikir ve gorusler ve ekindeki dosyalar sadece adres sahip/sahiplerine ait olup, Yasar Toplulugu Sirketleri bu mesajin icerigi ile ilgili olarak hic bir hukuksal sorumlulugu kabul etmez. Eger gonderilmesi dusunulen kisi veya kurulus degilseniz, lutfen gonderen kisiyi derhal haberdar ediniz ve mesaji sisteminizden siliniz.The information contained in this e-mail and any files transmitted with it are intended solely for the use of the individual or entity to whom they are addressed and Yasar Group Companies do not accept legal responsibility for the contents. If you are not the intended recipient, please immediately notify the sender and delete it from your system.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

-
List info/subscribe/u
Stefan Paetow
2015-02-25 13:37:34 UTC
Permalink
Post by Clement Ogedengbe
Does anyone know of any monitoring script/tool that can be used to test that the RADIUS server is authenticating properly and which can send an alert by email or text in the event that the server rejects authentication of a valid user credentials a number of times.
An easy option is to use rad_eap_test (it requires eapol_test) with crond to run an authentication at the desired interval. When it starts throwing a wobbly, have it email you :-)

Stefan Paetow
Moonshot Industry & Research Liaison Coordinator

t: +44 (0)1235 822 125
gpg: 0x3FCE5142
xmpp: ***@jabber.dev.ja.net
skype: stefan.paetow.janet
Lumen House, Library Avenue, Harwell Oxford, Didcot, OX11 0SG

jisc.ac.uk

Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.
Jisc Collections and Janet Ltd. is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under Company No. number 2881024, VAT No. GB 197 0632 86. The registered office is: Lumen House, Library Avenue, Harwell, Didcot, Oxfordshire, OX11 0SG. T 01235 822200.
Michael Schwartzkopff
2015-02-25 13:51:46 UTC
Permalink
Post by Clement Ogedengbe
On two occasions in the last 2 weeks, our RADIUS server suddenly started to
reject ALL users. Even though we have set up a failover system.
Unfotunately, the fail-over system did not kick in because the RADIUS
service was still running, only that it was rejecting all users for some
strange reasons.
Does anyone know of any monitoring script/tool that can be used to test that
the RADIUS server is authenticating properly and which can send an alert by
email or text in the event that the server rejects authentication of a
valid user credentials a number of times.
Best Regards
Clement Ogedengbe
Include the RADIUS service in your monitoring tool. Best tools are OpenNMS,
Zabbix oder nagios.

If you do not have a monitoring tool, set it up.

Mit freundlichen Grüßen,

Michael Schwartzkopff
--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
Matthew Newton
2015-02-25 14:56:13 UTC
Permalink
Post by Clement Ogedengbe
On two occasions in the last 2 weeks, our RADIUS server suddenly
started to reject ALL users. Even though we have set up a
failover system. Unfotunately, the fail-over system did not kick
in because the RADIUS service was still running, only that it
was rejecting all users for some strange reasons.
A reject to your NAS means that the NAS believe the RADIUS server
is still there (well, it is...) so it doesn't remove it.
Post by Clement Ogedengbe
Does anyone know of any monitoring script/tool that can be used
to test that the RADIUS server is authenticating properly and
which can send an alert by email or text in the event that the
server rejects authentication of a valid user credentials a
number of times.
I run a shell script on the RADIUS servers. It

restarts winbind and/or FreeRADIUS if ntlm_auth does not
succeed

stops FreeRADIUS if auth still fails after the above

stops FreeRADIUS if disk usage gets too high

I've had no problems like yours since running this. If there are
problems, FreeRADIUS is forcibly stopped, which means the NAS
jumps on to the next server.

It works for us, but may be full of bugs and eat your system. Use
it at your own risk. There are likely many better solutions out
there, but I've put it on github if you're interested.

https://gist.github.com/mcnewton/8c6c54ffc04acf031a08

We also run Nagios checks against the RADIUS server, so get alerts
from that as well as this script. The Nagios checks use eapol_test
to check the stack that way, but can't stop the RADIUS server if
there has been a problem.

Matthew
--
Matthew Newton, Ph.D. <***@le.ac.uk>

Systems Specialist, Infrastructure Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom

For IT help contact helpdesk extn. 2253, <***@le.ac.uk>
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.h
A***@lboro.ac.uk
2015-02-25 17:29:32 UTC
Permalink
hi,

we use NAGIOS and have some local eapol_test scripts
formonitor/alerts and use'monit' to check status of the radius
process and restart it when dead

alan
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/li
Arran Cudbard-Bell
2015-02-25 18:44:05 UTC
Permalink
Post by A***@lboro.ac.uk
hi,
we use NAGIOS and have some local eapol_test scripts
formonitor/alerts and use'monit' to check status of the radius
process and restart it when dead
You don't need a monitoring solution. Most NAS will fail over
quite happily once the server stops responding.

Check the return code of the failing module and use the
do_not_respond policy.

sql {
fail = 1
}
if (fail) {
do_not_respond
}

Do that for all modules critical to authentication.

It's very rare that responding with an Access-Reject on module
failure is an appropriate action. Unfortunately changing the
behaviour in the default config would be very disruptive.

-Arran

Arran Cudbard-Bell <***@freeradius.org>
FreeRADIUS development team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
Matthew Newton
2015-02-25 22:24:27 UTC
Permalink
Post by Arran Cudbard-Bell
Post by A***@lboro.ac.uk
we use NAGIOS and have some local eapol_test scripts
formonitor/alerts and use'monit' to check status of the radius
process and restart it when dead
You don't need a monitoring solution.
What planet are you visiting from? ;-)

Maybe this would be ok, though:

sql {
fail = 1
}
if (fail) {
do_not_respond
send_admin_emails_until_this_broken_mess_is_fixed
}
Post by Arran Cudbard-Bell
Do that for all modules critical to authentication.
It fixes the problem of NASes hanging onto a RADIUS server that's
broken, sure. But doesn't help you know that you need to fix it!

OTOH, anything but monit. My experiences has been along the lines
of "is that service running? Oh great, let's restart it just in
case." Hence replaced by a very small shell script!

Cheers,

Matthew
--
Matthew Newton, Ph.D. <***@le.ac.uk>

Systems Specialist, Infrastructure Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom

For IT help contact helpdesk extn. 2253, <***@le.ac.uk>
-
List info/subscribe/unsubscribe? See http://www.freera
Arran Cudbard-Bell
2015-02-25 22:48:00 UTC
Permalink
Post by Matthew Newton
Post by Arran Cudbard-Bell
Post by A***@lboro.ac.uk
we use NAGIOS and have some local eapol_test scripts
formonitor/alerts and use'monit' to check status of the radius
process and restart it when dead
You don't need a monitoring solution.
What planet are you visiting from? ;-)
I meant for solving that particular issue :P
Post by Matthew Newton
sql {
fail = 1
}
if (fail) {
do_not_respond
send_admin_emails_until_this_broken_mess_is_fixed
}
You just scrape the logs from the NAS looking for "can't contact RADIUS server messages" :)
Post by Matthew Newton
Post by Arran Cudbard-Bell
Do that for all modules critical to authentication.
It fixes the problem of NASes hanging onto a RADIUS server that's
broken, sure. But doesn't help you know that you need to fix it!
OTOH, anything but monit. My experiences has been along the lines
of "is that service running? Oh great, let's restart it just in
case." Hence replaced by a very small shell script!
Oh pfft. Monit works fine... ish. Munin is a fun one, i've had one hung RADIUS instance take down all monitoring for the box. I guess thats our fault though, for not implementing a read timeout in radmin.

Suppose I should go and fix that *grumble*.

-Arran
Post by Matthew Newton
Cheers,
Matthew
--
Systems Specialist, Infrastructure Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Arran Cudbard-Bell <***@freeradius.org>
FreeRADIUS development team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
Matthew Newton
2015-02-25 23:12:02 UTC
Permalink
Post by Arran Cudbard-Bell
You just scrape the logs from the NAS looking for "can't contact RADIUS server messages" :)
Heh. I call that a monitoring solution :)
Post by Arran Cudbard-Bell
Post by Matthew Newton
OTOH, anything but monit. My experiences has been along the lines
of "is that service running? Oh great, let's restart it just in
case." Hence replaced by a very small shell script!
Oh pfft. Monit works fine... ish.
Very much "ish". Maybe I just totally failed configuring it (for
both RADIUS and DHCP servers), but my shell script has been a)
much simpler and b) reliable. I'm all for that any day.

I wasn't joking about monit deciding to restart (actually, stop)
services just because it felt like it. I scrapped it when it
couldn't cope with simple logic.
Post by Arran Cudbard-Bell
Munin is a fun one, i've had one hung RADIUS instance take down
all monitoring for the box. I guess thats our fault though, for
not implementing a read timeout in radmin.
Don't generally have a problem with munin itself. It's one of my
first turn-to things for checking what a system was doing, and
invariably useful. Always find the plugin collection a total mess,
though. Again, that could just be me.
Post by Arran Cudbard-Bell
Suppose I should go and fix that *grumble*.
Maybe. A monitoring solution should be able to cope with that sort
of thing.

Matthew
--
Matthew Newton, Ph.D. <***@le.ac.uk>

Systems Specialist, Infrastructure Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom

For IT help contact helpdesk extn. 2253, <***@le.ac.uk>
-
List info/subscribe/unsubscribe? See htt
Clement Ogedengbe
2015-02-26 08:28:50 UTC
Permalink
Thanks to Matthew for the shell script. It's brilliant as it perfectly meets our needs (with a few tweek though).


Clement


-----Original Message-----
From: Freeradius-Users [mailto:freeradius-users-bounces+c.ogedengbe=***@lists.freeradius.org] On Behalf Of Matthew Newton
Sent: 25 February 2015 14:56
To: FreeRadius users mailing list
Subject: Re: RADIUS Monitoring tool
Post by Clement Ogedengbe
On two occasions in the last 2 weeks, our RADIUS server suddenly
started to reject ALL users. Even though we have set up a failover
system. Unfotunately, the fail-over system did not kick in because the
RADIUS service was still running, only that it was rejecting all users
for some strange reasons.
A reject to your NAS means that the NAS believe the RADIUS server is still there (well, it is...) so it doesn't remove it.
Post by Clement Ogedengbe
Does anyone know of any monitoring script/tool that can be used to
test that the RADIUS server is authenticating properly and which can
send an alert by email or text in the event that the server rejects
authentication of a valid user credentials a number of times.
I run a shell script on the RADIUS servers. It

restarts winbind and/or FreeRADIUS if ntlm_auth does not
succeed

stops FreeRADIUS if auth still fails after the above

stops FreeRADIUS if disk usage gets too high

I've had no problems like yours since running this. If there are problems, FreeRADIUS is forcibly stopped, which means the NAS jumps on to the next server.

It works for us, but may be full of bugs and eat your system. Use it at your own risk. There are likely many better solutions out there, but I've put it on github if you're interested.

https://gist.github.com/mcnewton/8c6c54ffc04acf031a08

We also run Nagios checks against the RADIUS server, so get alerts from that as well as this script. The Nagios checks use eapol_test to check the stack that way, but can't stop the RADIUS server if there has been a problem.

Matthew


--
Matthew Newton, Ph.D. <***@le.ac.uk>

Systems Specialist, Infrastructure Services, I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom

For IT help contact helpdesk extn. 2253, <***@le.ac.uk>
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/l
John Douglass
2015-03-02 16:09:13 UTC
Permalink
Post by Clement Ogedengbe
On two occasions in the last 2 weeks, our RADIUS server suddenly started to reject ALL users. Even though we have set up a failover system. Unfotunately, the fail-over system did not kick in because the RADIUS service was still running, only that it was rejecting all users for some strange reasons.
Does anyone know of any monitoring script/tool that can be used to test that the RADIUS server is authenticating properly and which can send an alert by email or text in the event that the server rejects authentication of a valid user credentials a number of times.
Best Regards
Clement Ogedengbe
Clement,

At Georgia Tech we are currently refining our Radius monitoring
services. We are finding that using eapol_test is not enough when
debugging the variety of failure scenarios that can occur when using I
will be writing some updated PHP monitoring objects, if enough people
are interested in how we will be monitoring this service for its various
failure scenarios, I'm glad to share it. We've had a lot of people
looking into these issues and working through our pain points.

Just to give you some background, we currently have deployed 4 hardware
radius servers, 1 VM radius server, we are using 1 "shared" AD server
and 2 "dedicated" VM AD servers (that only our radius servers
communicate with via samba) between

Not here to argue over use of EAP-PEAP-MSChapV2 vs EAP-TLS, we all live
within our environments and make them the best there is.

If you are using the configuration of Controller -> Radius ->
Samba/ntlm_auth -> AD here are a number of things you need to consider
that we have come across:

1) The samba joins to AD are somewhat brittle.

There have been instances when our samba service (winbind) has
completely lost its privileges to AD. This has happened under numerous
versions and has happened at random times. I'm sure it's something to do
with the renegotiation of keys between the joined samba machine and the
AD servers.

When this "permission" issue occurs, radius is running peachy keen but
the responses from the ntlm_auth calls return failed. I do not believe
that the particular error message type ends up in the logs but it does
manifest it on the command line with something like:

ntlm_auth --username=someuser --request-nt-key

will generate some final error mesage of:

NT_STATUS_ACCESS_DENIED: Access denied (0xc0000022)

An eapol_test will simply tell you that the authentication failed (it's
a yes or no answer) but not why without going into either debug mode or
running the ntlm_auth command by hand alongside it and capturing that
output.

If anyone has some experience in the above failures of Samba joins to
AD, I'd love to hear it.

2) AD servers get overloaded when we are talking about large numbers of
users. We are still learning what our load limits are. Our devicebase is
anywhere between 20k-25k+ with two dedicated AD vms (scaled large) with
radius requests from controller with about 250 APs (we had to scale down
due to a radius flaw in the controller software that we have been
testing a fix for Cisco for).

You absolutely need to use the maxConcurrentApi connection setting if
you are doing any sort of large user transactions.

http://support.microsoft.com/kb/2688798

3) The number of (default) connections from winbind to AD are limited.
You need to use modern versions of samba (look at EnterpriseSamba.org
for modern packages) and use properly configured smb.conf. I have found
these settings for us work. Are the optimal/ideal? I have no idea :) But
it seems to work well-ish. If you are serious about using samba/winbind
you need to be using the latest 4.1.17 which fixes the following issues:

a) Security flaw in smbd (even though we don't run it, just nmbd and
winbind it's better to have it!)
b) Connections to AD dynamically grow and shrink as need arises
(previous versions to 4.1.16 did not do this well, and not at all before
I think 4.1.12) (previous versions just increased the number of
connections during a spike and left them hanging out...eventually
causing problems).

My settings (probably allll sorts of wrong as I am far from a samba
expert) are:

winbind max clients = 16192
winbind max domain connections = 128
winbind request timeout = 30
winbind reconnect delay = 5
log level = 4
syslog = 6

(I log my samba stuff so I can look deeper into authentication issues).

Most of the issues we have found are rarely with Freeradius. The only
complaint about Freeradius is that it is sometimes hard to correlate the
radius error messages to the authentication requests so that we can see
"this error X caused this authentication on line Y to fail".

So long story short:

1) Use eapol_test as a base with a known "good" username/password pair
2) use results from ntlm_auth command for additional test and further
information on what might possibly be going wrong with that same
username/password pair if eapol_test fails

- JohnD


-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/use
Phil Mayers
2015-03-02 17:28:14 UTC
Permalink
Post by John Douglass
Post by Clement Ogedengbe
On two occasions in the last 2 weeks, our RADIUS server suddenly started to reject ALL users. Even though we have set up a failover system. Unfotunately, the fail-over system did not kick in because the RADIUS service was still running, only that it was rejecting all users for some strange reasons.
Does anyone know of any monitoring script/tool that can be used to test that the RADIUS server is authenticating properly and which can send an alert by email or text in the event that the server rejects authentication of a valid user credentials a number of times.
Best Regards
Clement Ogedengbe
Clement,
At Georgia Tech we are currently refining our Radius monitoring
services. We are finding that using eapol_test is not enough when
debugging the variety of failure scenarios that can occur when using I
will be writing some updated PHP monitoring objects, if enough people
are interested in how we will be monitoring this service for its various
failure scenarios, I'm glad to share it. We've had a lot of people
looking into these issues and working through our pain points.
Just to give you some background, we currently have deployed 4 hardware
radius servers, 1 VM radius server, we are using 1 "shared" AD server
and 2 "dedicated" VM AD servers (that only our radius servers
communicate with via samba) between
Not here to argue over use of EAP-PEAP-MSChapV2 vs EAP-TLS, we all live
within our environments and make them the best there is.
If you are using the configuration of Controller -> Radius ->
Samba/ntlm_auth -> AD here are a number of things you need to consider
1) The samba joins to AD are somewhat brittle.
Yep. We check this with a passive nagios service which basically does this:

for attempt in 1 2 3 4:
wbinfo -t
if success:
break

# sometimes the pipe just times out harmlessly; retry
# if we see that kind of error message, and only that
if NT_STATUS_PIPE_NOT_AVAILABLE in output:
sleep 1
continue

# failed, and not a ignorable error message; fall through
break

if not success:
service fail

net -P ads status
if not success:
service fail
Post by John Douglass
There have been instances when our samba service (winbind) has
completely lost its privileges to AD. This has happened under numerous
versions and has happened at random times. I'm sure it's something to do
with the renegotiation of keys between the joined samba machine and the
AD servers.
This can happen with Windows servers too; we've had Windows 2012R2
member servers fall out of the domain. It's not Samba-specific; it
appears to be related to AD replication issues occurring at the same
time as an AD machine account password event. But this is all hypothesis
- we haven't proven it.

If you do a bit of googling, you'll see a lot of people run into it.
It's just a bit of Microsoft nonsense we all have to live with :o(
Post by John Douglass
If anyone has some experience in the above failures of Samba joins to
AD, I'd love to hear it.
See above!
Post by John Douglass
2) AD servers get overloaded when we are talking about large numbers of
users. We are still learning what our load limits are. Our devicebase is
anywhere between 20k-25k+ with two dedicated AD vms (scaled large) with
radius requests from controller with about 250 APs (we had to scale down
due to a radius flaw in the controller software that we have been
testing a fix for Cisco for).
There has been some discussion about this on -devel recently. Matthew
Newton has a patch which runs ntlm_auth in "pipe" mode, avoiding the
overhead of a fork/exec/startup on each auth. This seems to make a
substantial difference - you might want to check the patch out.

Short version: it might not be AD. It might be the overhead of starting
ntlm_auth on every mschap request.
Post by John Douglass
You absolutely need to use the maxConcurrentApi connection setting if
you are doing any sort of large user transactions.
http://support.microsoft.com/kb/2688798
This is AD version dependent. We do *not* have it set, and seem to run
without problem, but are on Windows 2012R2 where the default is different.
-
List info/subscribe/unsubscribe? See http://www.freeradius.or

Loading...