New network layer

When I was working on the MLN server, one thing that made development much easier is that the lowest levels were automatically taken care of, as much of the work had already been done by the TCP and HTTP protocols. In contrast, my work on the LU server has led to me having to start at the bottom and implement the RakNet protocol, which takes the role of TCP in LU’s networking. My experience with the MLN server got me thinking that perhaps it was possible to redesign LU’s networking so that the benefits of MLN’s networking could also be applied to LU. This is what led me to develop a new network layer that I’ve now finished work on.

The protocols involved here are situated at the transport layer of networking. At this layer, there are two protocols that are widely in use: UDP and TCP. These protocols are involved in basically every traffic on the internet. UDP is an extremely simple protocol, and essentially all it does is providing ports to associate network packets with a specific process on a computer. At this layer, network traffic is still very unreliable; packets sent from one computer may get lost in the network and never arrive at the destination. They may also arrive in the wrong order, which is also important for many applications. UDP does not attempt to solve these problems, and therefore packets sent using the UDP protocol are unreliable. TCP however implements special mechanisms under the hood to ensure that packets will reliably arrive, and will arrive in the proper order. A lot of applications (websites, email, etc) need packets to be reliable, but there are some for which unreliable packets are actually useful, like DNS. However, for these applications, this is usually a clear choice, they’ll either go full unreliable or full reliable.

Games however pose a special case in that they sometimes need reliable packets, and sometimes need unreliable ones. Here are some examples from LU:

  • Logging in to the game needs to be a reliable packet, because it’s important that the server receives the login request – if not, you’d be stuck at the login screen forever.
  • The position updates sent by the client when the player walks around don’t actually need to reach the server 100% of the time – they are sent so frequently that it doesn’t matter if one is lost. Therefore position updates are more suited to unreliable packets.

Therefore LU can’t just decide on one of UDP or TCP and stick with it, it needs features from both. This is what the RakNet protocol tries to solve: it allows LU to choose between unreliable and reliable transmission for every packet. RakNet works by using UDP as a base, and then builds mechanisms on top that ensure reliability. But unlike TCP, these mechanisms are optional, and can be enabled per packet.

This seems like a solid solution for this problem. However, RakNet has a few quirks in its implementation that make using it less than optimal:

  • In addition to the distinction between “reliable” and “unreliable”, RakNet also supports some more fine grained options that are somewhere between raw UDP and a full TCP implementation. However, at least in LU’s case, these special modes are almost never used – it’s usually all or nothing.
  • RakNet has some flaws in its protocol design that make it unnecessarily complicated, the same could be achieved more elegantly.
  • RakNet is a niche protocol, with far less support than UDP or TCP. This makes it more likely to have bugs and inefficient implementations.
  • LU’s copy of the RakNet implementation is statically compiled into the client .exe, which means it can’t be updated, especially since LU itself is no longer updated.
  • RakNet implements its own combination of cryptography to encrypt its connection, and the encryption isn’t enforced. Together with the fact that RakNet is niche and more than 10 years old at this point, the protocol cannot be said to be secure.

Thus, there’s room for improvement for LU’s networking situation. While working on the MLN server, I had an idea for a possible solution: The same kind of per-packet distinction could be achieved by using both UDP and TCP simultaneously, sending on UDP when unreliable packets were needed, and on TCP when reliable packets were needed. The solution seemed both simple and obvious enough that it was more a surprise that LU wasn’t using it already.

However, even though this seems straightforward in theory, LU’s situation made it quite complicated to actually implement in practice. We can’t just switch LU from one protocol to another, since RakNet is baked into the program and can’t be removed. Therefore, what I’ve been working on the past few months is a “shim”: A program running on the client side, acting as a local server for the RakNet protocol, translating the traffic to the TCP/UDP based protocol, and relaying it to the actual server. Due to the way LU works, this has been a bit more complicated than initially thought, as the shim also needs to simulate multiple server instances for when the player switches worlds. However, after a lot of development and testing, I’ve been able to get it fully working.

With the TCP/UDP protocol fully working, it was straightforward to swap out the TCP protocol for the encrypted TLS protocol, which is widely used, for example to secure HTTPS connections. With the protocol now using this cryptographic industry standard, LU’s security is now rock solid.

Additionally, the UDP and TCP implementations are provided by the operating system, so they can be improved without the program having to change. This also means that they are optimized to the full extent possible, as the same implementations are used for things like high speed file transfer and video streaming.

I’ve done a few unscientific benchmarks, loading worlds in LU with the old RakNet protocol and the new TCP/UDP protocol, and there are some quite noticeable improvements:

Values are time from when the client signals clientside load completion to the end of the loading screen, on localhost, without encryption, in seconds:

WorldOld TimeNew Time
Venture Explorer42
Crux Prime272
Avant Gardens363

Edit: It seems my connection or my ISP’s firewall may have significantly slowed down RakNet’s loading times in this test. It appears RakNet can be significantly faster on better connections, but the new protocol still seems to outperform it there.

Previously, loading a world could be quite slow, mostly because the congestion control and retransmission timeout algorithms used in my RakNet implementation seemed to have some problems estimating the ideal values for the amount of packets per second to send, and how long to wait until the packets would be resent. With the new implementation, everything is lightning fast, and the complexity is outsourced to the TCP implementation of the operating system.

Next steps

However, it’s possible to make these improvements even better. To keep backwards compatibility with the RakNet protocol, right now the shim needs to run in the background all the time as a visible console application. The client’s boot.cfg also needs to be changed to to configure the client to connect to the shim instead of connecting to the server directly. This isn’t as user-friendly as I’d like it to be. The complexity of running the shim and translating between the protocols also adds some overhead, though it’s usually small compared to network latency.

However, these things can be fixed. As I’ve mentioned above, the RakNet implementation is baked into the client .exe – it’s not possible to replace it, as you’d need to be able to recompile the client. This isn’t 100% true – there is a way of modifying the client to run code instead of RakNet’s network functions: .dll hooking. Dll hooking works by getting the client to load a dynamically linked library at startup. The code in this library then has access to the internals of the client process, and with some surgical precision, it can find the location of the machine instructions of the RakNet functions and hijack them to run its own code instead. This technique has been successfully done in a number of games to mod them without source code. However it’s also much more intricate and complicated than the shim solution, which is why I chose to focus on completing the shim first to show that this kind of protocol replacement is possible and valuable. Now that that’s done, I can further look into a .dll-hooking-based solution, which should in theory be able to provide record speeds at less overhead.

In the meantime, I’ll be releasing both the shim executable and source code in the next days, so that other projects are also able to make use of this new protocol.

Alpha

The network layer has been very important to me in making a decision about opening the ongoing Alpha test to new players. Some alpha testers have previously reported getting randomly disconnected from the server sometimes, which this new protocol should fix. Along with the faster loading times and the increased security, this new protocol is vital in improving the user experience of the testers. Therefore it has been an absolute requirement to me for a new alpha opening. With the protocol completed, the outlook for an alpha opening now looks much better… but more on that in the next progress report 😉

See you all in-game!
– lcdr

Progress report: October 2017 – January 2018

So, by now it’s been three months since the alpha started. Time for some updates on what’s happened in the last three months.

You’ve probably noticed that there haven’t been too many updates on progress lately. That’s partially because I’ve been pretty busy and haven’t had as much time in the last months, but also simply because alpha progress isn’t as easy to demonstrate as feature implementation progress. There’s been a bunch of improvements since the alpha started, but most of them have been minor or internal, and not the typical stuff we can show in our videos. However, by now there are enough that I can talk about them in more detail.

So here we go, a comprehensive list of bugs that were resolved and features that were implemented since the start of the alpha:

(Issue titles usually appear as the tester submitted them)

October

  • Pyraknet: Issue: Duplicate detection not working properly.

    Pyraknet is the network library implementing the RakNet 3.25 protocol that provides the low-level network layer that LU uses for network communication.
    It needs to resend packets when the client doesn’t receive them for some reason, and it needs to be able to detect and throw away resends it doesn’t need when it did receive the packet.
    In a rare case (only 1 message used by LU was using this packet type) the duplicate detection didn’t work. This patch fixed this problem.

  • Issue: Block Yard doesn’t work.

    Most objects on Block Yard didn’t appear for some reason. It turned out this was due to a change in the server’s spawning logic, which, ironically, actually was implementing a new feature, but invalidated an assumption made by Block Yard’s world script. The issue was quickly fixed with some changes to the script.

  • Implemented: Achievements for collecting powerups.

    A rare achievement type, used only by 3 achievements in the whole game, was implemented after a tester noticed the achievements not working properly.

  • Issue: Rocket Take-off uses another rocket than the one placed.

    Selecting a rocket using the mouse rather than pressing shift wasn’t working correctly, and got fixed.

  • Issue: Imaginite isn’t taken by mission.

    The mission in Nimbus station where you can trade imaginite for backpack space wasn’t removing the imaginite properly. Fixed.

    November

  • Issue: Turret quickbuilds don’t despawn.

    Avant Gardens’ robots drop turret quickbuilds, which only despawned when built by a player. Most of the time the players didn’t build the turrets and only smashed the robots, leading to quickbuilds piling up on the AG battlefields. Fixed by introducing a self-destruct timer.

  • Issue: Imaginite isn’t taken by interactables.

    Similarly, survival and wishing wells also didn’t remove imaginite on interaction. This also got fixed.

  • Issue: Survival enemies don’t spawn.

    Similar to the Block Yard issue, spawning logic was also affecting AG Survival. Similarly fixed with alterations to the script.

  • Issue: Unimplemented achievement rewards.

    A few achievements didn’t give out rewards when completed. It turned out this was due to some entries in the game’s original database not being set correctly.

  • Implemented: Achievements for survival times.

    Most achievements regarding survival times were already implemented, but it turned out the game actually uses two separate mechanics for these achievements, and only one of them was implemented before.

  • Issue: Monument finish line won’t end quickbuilds.

    A quickbuild wasn’t completing correctly. It turned out this was because the object overrode the game’s database on quickbuild complete times. Fixed by implementing the override.

  • Issue: Imagination fountain doesn’t drop imagination.

    Some items that were supposed to drop powerups, didn’t. Fixed by implementing the associated script.

  • Pyraknet: Issue: Absolute time used when relative time needed.

    In some packets pyraknet has to send timing information in packets. It turned out the time it was sending was absolute on Linux (but not on Windows), when it should have been sending time relative to program start. This change enforced using relative time everywhere.

  • Implemented: Random missions.

    The game features some daily missions that are drawn from a random pool and not offered sequentially. Support for this was implemented.

  • Issue: “Stagecraft” achievement not working.

    Another bug caused by a script not working correctly. While some more work on the script remains to be done, it now correctly updates the achievement.

  • Issue: Mission to play a guitar not working.

    Similarly caused by a script, and similarly had some complications appearing that caused full implementation to be postponed. The mission is correctly updated now, however.

  • Issue: Paradox’s Plasma Ball 1 does not kill enemies.

    Caused by a bug in skill parsing. Skill structures remain relatively unknown, mostly due to complex conditions that are hard to reverse engineer. A few similar bugs for other items still exist, and in some cases investigations are pretty much stuck because the condition can’t be analyzed even when the matching code is found. In this case however it was possible to resolve the bug.

  • Issue: Plasma Ball 1 jump attack does nothing but consume imagination.

    In some instances, skills can cancel their execution, as happened with the item here, which doesn’t cast its effect if the player is jumping. The server was however still removing the skill’s imagination cost, even if the skill cast had failed. This was fixed to only take the cost if the skill actually succeeded.

  • Issue: Spider Cutscene being played for everyone

    Some network packets are supposed to be sent to all players, while some should be sent to just one player. In this case, it seems a message that would normally be sent to all had its behavior overridden to only be sent to one player. The server didn’t know about the override, and sent it to everyone, causing the cutscene to appear for everyone in the world. Fixed via override.

  • Issue: Rocco Sirocco doesn’t spin

    A script wasn’t working properly, and wasn’t displaying a cinematic when the mission completed. Fixed through script modifications.

December

  • Issue: Daily Mission “Six Shooter” doesn’t give Plunger Gun.

    A script wasn’t implemented, and didn’t give out a mission item.

December featured several large refactors, which reorganized the server internals and cleaned up the code. They didn’t change the external behavior of the server by much, but nonetheless represent significant improvements, as they affected about half of the server and made it easier to implement new features in the future.

  • Refactor: Server command system.

    The server command system was rewritten to be completely separate from the core server. As a result of this change it’s now possible to delete the entire folder related to commands, and the server will continue to function as normal (just without commands, which aren’t part of normal gameplay).

  • Implemented: Mail attachments.

    Mailing attachments had been supported for a while, and the server was already using it to distribute achievement rewards when inventory space was full. However the interface allowing players to send items per mail was disabled, until this patch changed this.

  • Implemented: Mail cost.

    Sending mail costs coins, and mailing achievements increases this cost by the cost of the items mailed. With this change, the server now subtracts the correct cost from the player when sending mail.

  • Pyraknet & LU server: Refactor: Separate BitStream into read-only and write-only versions.

    Pyraknet also provides a way of reading and writing binary data sequentially, and which allows reading/writing single bits. It turned this was almost never used for reading and writing from/to the same stream, and therefore could be separated to make the intent of the stream more clear.

  • Pyraknet & LU server: Refactor: Change to use composition instead of inheritance.

    Pyraknet had previously been using an inheritance model, where the LU server overrode certain things to implement its own functionality. This worked well and allowed for a lot of freedom for overrides, but didn’t separate the low-level pyraknet stuff from the high-level LU server stuff well. This was changed to use composition instead, where the LU server interacts with the low-level layer mainly via callbacks and delegations, which provided proper separation.

  • Pyraknet & LU server: Refactor: Add type annotations.

    Python is dynamically typed, and as such doesn’t enforce a variable to remain the same data type. However information about the intended data type is useful for documentation and static validation, and so Python has added support for this information in recent versions. With type annotation support improving, it was time to add type information to the pyraknet & LU server projects where it hadn’t already been present.

January

  • Pyraknet & LU server: Refactor: Various refactors.

    A bunch of minor things were changed and improved, some variables were made more restricted, and some internal systems were reorganized. The changes are relatively specific to the internal implementation, so I’ll keep this paragraph short. However, that doesn’t mean these changes were insignificant – grouped together, they were one of the larger refactors.

  • Issue: Buttons at the AG Monument trigger even when not built.

    Buttons could be pushed even though they hadn’t been built yet, also known as “invisible buttons”. This was caused by an unimplemented check in the component, and was fixed by implementing the check.

  • Pyraknet & LU server: Refactor: Restrict bitstream interface further.

    Previously it was possible to access packet data both by array index notation and via the sequential interface. In practice the random access was rarely used, so this patch removed this functionality and restricted the interface in some other points as well, which makes passing the bitstream to other handlers safer.

Next up

I hope this list could give you some insight to the kind of bugs that pop up in alpha testing, and to what I’ve been working on in the past months.

There have been a bunch of bugs fixed since the start of the alpha, but there’s plenty left to fix. 50 bugs remain open, and while most of them should be possible to fix, it will take some time. I want to make sure the server doesn’t just have a lot of worlds, but also runs smoothly, and for that to work, it’s necessary to fix the open bugs before opening the next alpha stage. Based on the time it took to fix the above bugs, it’s likely it will take some months until the server is ready for the next alpha. But by then, the server should be pretty robust. 🙂 I’ll publish something when the server’s ready, and I’ll continue the progress updates in the meantime.

See you all in-game!
-lcdr

 

Progress Report: August 2017

This is just going to be a pretty technical short summary about what we’ve been doing this month. Don’t expect any big announcements here, think of it more as a “behind the scenes” look at what we’ve been working on. Server development is just one of multiple things we do, so there’s more than that listed here.

Note: Before I start, I’d like to mention something:
In the last post I mentioned that for server costs, we’re going to have to set up some sort of donation system at some point. I also mentioned that we’d like to thank donators in some way, and suggested an in-game cosmetic shirt for this. Some of you have responded that you see this as too similar to “freemium” or “pay-to-win”. We only planned this as a small thank-you, with nothing more in mind, similar to something like donator mentions on twitch. However, we understand your concern, and I want to assure you that we will never be “freemium”, everything is and will be 100% free, no catches. Therefore, to leave no questions open, we have decided not to give out in-game items at all. Rather, items will only be available through gameplay. We still want to thank everyone who considers donating to us to help us with the server, but we want to make clear that the playing experience is the most important thing to us.

Now, on to the actual progress report!

Research

  • Investigations on older lvl formats. Lvl files are used by the game to store world information like NPCs and Enemies, and are vital for a server to work.
  • Investigations on the game’s audio formats.

Research tooling

  • As a side effect of the above investigations, it turned out the current structure definition language used to document formats is not completely capable of describing all formats. Currently missing are
    • Support for pointers in formats
    • Support for recursion
    • Support for definition of larger structures from smaller ones
    • Complex looping

This needs to be added at some point to be able to describe more complex formats. The struct definition language is used by the captureviewer and structparser tools to parse binary files and packets. Luckily so far it has sufficed for them, which is why these issues have only surfaced now.

  • Work on a struct-visualizing hex-editor to help with internal research on formats. Screenshot:

    Parsed and highlighted stucts in the viewer, with unparsed part visible below.

    It’s still experimental and too early to release, but it’s helpful for investigations on file formats.

  • Work on the client’s fdb database format.

Documentation

  • Work on cataloging available clients. For research purposes we are always interested in clients from older versions, especially from beta and alpha.
  • The docs are now also available in a faster-loading read-only mode.
  • I’d like to do some more documentation maintenance at some point, but it’s difficult to find time.
  • Work on improving contribution-friendliness to the docs, in cooperation with researchers from other projects. We always welcome contributions, especially if they are well-researched and -sourced.

Development:

Pyraknet

Pyraknet is a minimal port of the network library used by LU, RakNet 3.25, to python. It’s open source and used by multiple projects.

  • Work on making the library more robust, featuring refactors, tests and type annotations.
  • Implemented support for split packet receiving. This was not possible before due to having no proper way of getting the client to send large amounts of data.
  • Investigations on maximum packet size. It seems LU has a packet size of 1200 bytes hardcoded for some reason, even though it’s not really necessary in the network. But just to be sure pyraknet will also use this lower size from now on.

Server

  • Features seen in our video for this month. Side note: LU is bananas.
  • Ongoing work to test the server against edgecases, high loads, network issues and others. This will take a lot of our time, so it’s pretty likely we won’t be able to release a video next month.

Website

  • The website has been online for a month now, and everything’s working well. Thank you for your threads in the forums, I’m glad the atmosphere is so nice there.
  • And of course thanks to everyone commenting on twitter and youtube as well. I read all of your comments, and I really appreciate them 😊

Notes on Alpha

It will still take a few months to prepare everything for an alpha release. Please be patient until then. We’ll post updates about release and admission phase dates once we have something to report, so you don’t have to worry about missing them.

 

See you all in-game!
– lcdr