New network layer

When I was working on the MLN server, one thing that made development much easier is that the lowest levels were automatically taken care of, as much of the work had already been done by the TCP and HTTP protocols. In contrast, my work on the LU server has led to me having to start at the bottom and implement the RakNet protocol, which takes the role of TCP in LU’s networking. My experience with the MLN server got me thinking that perhaps it was possible to redesign LU’s networking so that the benefits of MLN’s networking could also be applied to LU. This is what led me to develop a new network layer that I’ve now finished work on.

The protocols involved here are situated at the transport layer of networking. At this layer, there are two protocols that are widely in use: UDP and TCP. These protocols are involved in basically every traffic on the internet. UDP is an extremely simple protocol, and essentially all it does is providing ports to associate network packets with a specific process on a computer. At this layer, network traffic is still very unreliable; packets sent from one computer may get lost in the network and never arrive at the destination. They may also arrive in the wrong order, which is also important for many applications. UDP does not attempt to solve these problems, and therefore packets sent using the UDP protocol are unreliable. TCP however implements special mechanisms under the hood to ensure that packets will reliably arrive, and will arrive in the proper order. A lot of applications (websites, email, etc) need packets to be reliable, but there are some for which unreliable packets are actually useful, like DNS. However, for these applications, this is usually a clear choice, they’ll either go full unreliable or full reliable.

Games however pose a special case in that they sometimes need reliable packets, and sometimes need unreliable ones. Here are some examples from LU:

  • Logging in to the game needs to be a reliable packet, because it’s important that the server receives the login request – if not, you’d be stuck at the login screen forever.
  • The position updates sent by the client when the player walks around don’t actually need to reach the server 100% of the time – they are sent so frequently that it doesn’t matter if one is lost. Therefore position updates are more suited to unreliable packets.

Therefore LU can’t just decide on one of UDP or TCP and stick with it, it needs features from both. This is what the RakNet protocol tries to solve: it allows LU to choose between unreliable and reliable transmission for every packet. RakNet works by using UDP as a base, and then builds mechanisms on top that ensure reliability. But unlike TCP, these mechanisms are optional, and can be enabled per packet.

This seems like a solid solution for this problem. However, RakNet has a few quirks in its implementation that make using it less than optimal:

  • In addition to the distinction between “reliable” and “unreliable”, RakNet also supports some more fine grained options that are somewhere between raw UDP and a full TCP implementation. However, at least in LU’s case, these special modes are almost never used – it’s usually all or nothing.
  • RakNet has some flaws in its protocol design that make it unnecessarily complicated, the same could be achieved more elegantly.
  • RakNet is a niche protocol, with far less support than UDP or TCP. This makes it more likely to have bugs and inefficient implementations.
  • LU’s copy of the RakNet implementation is statically compiled into the client .exe, which means it can’t be updated, especially since LU itself is no longer updated.
  • RakNet implements its own combination of cryptography to encrypt its connection, and the encryption isn’t enforced. Together with the fact that RakNet is niche and more than 10 years old at this point, the protocol cannot be said to be secure.

Thus, there’s room for improvement for LU’s networking situation. While working on the MLN server, I had an idea for a possible solution: The same kind of per-packet distinction could be achieved by using both UDP and TCP simultaneously, sending on UDP when unreliable packets were needed, and on TCP when reliable packets were needed. The solution seemed both simple and obvious enough that it was more a surprise that LU wasn’t using it already.

However, even though this seems straightforward in theory, LU’s situation made it quite complicated to actually implement in practice. We can’t just switch LU from one protocol to another, since RakNet is baked into the program and can’t be removed. Therefore, what I’ve been working on the past few months is a “shim”: A program running on the client side, acting as a local server for the RakNet protocol, translating the traffic to the TCP/UDP based protocol, and relaying it to the actual server. Due to the way LU works, this has been a bit more complicated than initially thought, as the shim also needs to simulate multiple server instances for when the player switches worlds. However, after a lot of development and testing, I’ve been able to get it fully working.

With the TCP/UDP protocol fully working, it was straightforward to swap out the TCP protocol for the encrypted TLS protocol, which is widely used, for example to secure HTTPS connections. With the protocol now using this cryptographic industry standard, LU’s security is now rock solid.

Additionally, the UDP and TCP implementations are provided by the operating system, so they can be improved without the program having to change. This also means that they are optimized to the full extent possible, as the same implementations are used for things like high speed file transfer and video streaming.

I’ve done a few unscientific benchmarks, loading worlds in LU with the old RakNet protocol and the new TCP/UDP protocol, and there are some quite noticeable improvements:

Values are time from when the client signals clientside load completion to the end of the loading screen, on localhost, without encryption, in seconds:

WorldOld TimeNew Time
Venture Explorer42
Crux Prime272
Avant Gardens363

Edit: It seems my connection or my ISP’s firewall may have significantly slowed down RakNet’s loading times in this test. It appears RakNet can be significantly faster on better connections, but the new protocol still seems to outperform it there.

Previously, loading a world could be quite slow, mostly because the congestion control and retransmission timeout algorithms used in my RakNet implementation seemed to have some problems estimating the ideal values for the amount of packets per second to send, and how long to wait until the packets would be resent. With the new implementation, everything is lightning fast, and the complexity is outsourced to the TCP implementation of the operating system.

Next steps

However, it’s possible to make these improvements even better. To keep backwards compatibility with the RakNet protocol, right now the shim needs to run in the background all the time as a visible console application. The client’s boot.cfg also needs to be changed to to configure the client to connect to the shim instead of connecting to the server directly. This isn’t as user-friendly as I’d like it to be. The complexity of running the shim and translating between the protocols also adds some overhead, though it’s usually small compared to network latency.

However, these things can be fixed. As I’ve mentioned above, the RakNet implementation is baked into the client .exe – it’s not possible to replace it, as you’d need to be able to recompile the client. This isn’t 100% true – there is a way of modifying the client to run code instead of RakNet’s network functions: .dll hooking. Dll hooking works by getting the client to load a dynamically linked library at startup. The code in this library then has access to the internals of the client process, and with some surgical precision, it can find the location of the machine instructions of the RakNet functions and hijack them to run its own code instead. This technique has been successfully done in a number of games to mod them without source code. However it’s also much more intricate and complicated than the shim solution, which is why I chose to focus on completing the shim first to show that this kind of protocol replacement is possible and valuable. Now that that’s done, I can further look into a .dll-hooking-based solution, which should in theory be able to provide record speeds at less overhead.

In the meantime, I’ll be releasing both the shim executable and source code in the next days, so that other projects are also able to make use of this new protocol.

Alpha

The network layer has been very important to me in making a decision about opening the ongoing Alpha test to new players. Some alpha testers have previously reported getting randomly disconnected from the server sometimes, which this new protocol should fix. Along with the faster loading times and the increased security, this new protocol is vital in improving the user experience of the testers. Therefore it has been an absolute requirement to me for a new alpha opening. With the protocol completed, the outlook for an alpha opening now looks much better… but more on that in the next progress report 😉

See you all in-game!
– lcdr

4 Replies to “New network layer”

  1. I am VERY excited for this new network protocol, AND potentially, a new alpha phase. I never got to enjoy this game in its full extent due to me often abstaining enemy combat, and the premium restrictions. However, I have a rather slim chance of making it into the alpha testing crew.

  2. To be fair, you are using a custom RakNet implementation for the server, not to mention that the version used for LU is relatively old (especially by the time it got shut down). So I wouldn’t necessarily say that RakNet itself is as flawed as you make it out to be here.

    I mean obviously the issue of using an outdated version would be next to impossible to fix as you explained, but if you really want to do a more accurate comparison I think you should at least use the original RakNet library on the server side too (disregarding the fact that it would be a bit tedious to change the current server implementation in that regards).

    Using standard TCP/UDP traffic might sound good compatibility and safety wise but there are reasons why many games use custom or 3rd party implementations like RakNet instead, and in most cases they are performance and efficiency.

    Which is why I’m a bit surprised by the obvious improvement in your time comparison (even if its rudimentary for now). But since you seem to have most of it already implemented I guess we’ll see how it works out, usually the more interesting results happen under heavy load anyway.

    Btw I don’t remember if I ever discussed this with you but I already wrote a dll injector of BFBC2 once. If you are interested or just want to talk about stuff feel free to drop me a mail (as I’m pretty busy nowadays and only check my stuff very infrequently) 😉

    1. Hey, sorry for the late response, I didn’t notice the comments on here. In any case, glad to hear from you again 🙂

      You’re right, most of the performance difference comes from the fact that I’m using a custom implementation of RakNet on the server. I’ve briefly mentioned that in the performance section, but I admit that I didn’t really elaborate on it much. The original implementation of RakNet would likely have fared much better in this comparison.

      However my primary objective with regards to performance was to improve the loading speed of the current server, which is using the custom implementation, and so it was more useful to me to compare with that. I’d provide some additional values for the original RakNet, but the server isn’t built to interface with it, so I can’t easily set up a comparable testcase.
      The issues I’ve listed about RakNet also still remain, although I should have been more precise about that in the post – they are issues specific to the old RakNet 3.25 version and its use in LU. Newer RakNet versions are probably without these issues, but I can’t say much about them, I’ve only worked with the RakNet version used in LU.

      I do remember our discussion of the dll injector/hook that one time – it’s actually what gave me the idea of using that approach here 😉 In the time since I’ve written this post I’ve been able to construct a basic dll hook that hooks the relevant raknet functions, but it’s still far from complete and I haven’t been able to work on it much due to IRL stuff getting in the way. I’d definitely be interested in talking about it and other stuff, I’ll drop you a mail.

Leave a Reply

Your email address will not be published. Required fields are marked *