The feasibility of the embedded software development with Lua programming
About Lua
Referring to Programming in Lua written by R. Ierusalimschy [1], Lua is a small scripting language written in C which is compatible with the C89 standard. Lua is much simpler to learn than C, and it supports a higher level of abstractions like object orientation in a much easier way.
Also, there are some characteristics of Lua as below which make Lua be a good scripting language to embed.
1. Lua is initially designed to be embedded into another application; therefore, it offers a large set of libraries designed to be easily integrated into C.
2. Using Lua C APIs, developers can easily use C code to be a library which can be referred to Lua code. Also, by making additional libraries in C, developers can extend the capabilities of Lua script.
3. Lua offers a garbage collection. Developers do not need to worry about memory leaks or critical bugs which could occur while manually managing dynamic memory.
4. Lua source code is designed to be easily tweakable. The core Lua engine can be as small as about 100 Kilobytes by excluding un-referred libraries, which is beneficial for porting it on embedded systems or small microcontrollers with memory constraints.
5. Additionally, Lua is free-of-charge under an open-source license which allows its use in commercial applications. This also allows the modification of its source code to customize it.
Running Lua on embedded systems without operating systems
Why?
Embedding Lua provides the ability to implement some of the functionality in Lua rather than C. That is to say, developers will be able to use Lua whenever some functionalities can be developed more easily with Lua. Moreover, Lua offers C APIs which allows very easy binding to C as demonstrated in this project. Using this binding, the functionalities written in Lua can interface with a C-wrapper library that exposes the hardware layer. This will further extend the capabilities of the Lua application.
Super loop-based approach without an operating system
Depending on the system requirements, the embedded system designs can be divided into mainly two approaches.
The first approach is running the system on top of operating systems (OS). Especially, the real-time operating system must be ported if the system is time critical requiring a high complexity with a lot of tasks and the preemptions. The preemption means the ability to suspend the currently running task when a higher priority task needs to be executed. Usually, OS kernels are responsible for managing the preemptions.
Moreover, OS provides middleware which is software between the kernel and applications such as a filesystem. OS should be adapted if the system requires middleware.
If the desired application has a small number of tasks to manage, the super loop-based approach is broadly used. In this system, all tasks are executed sequentially within an infinite loop. That means the tasks in embedded applications will be repeated infinitely as long as the system is operating.
This article is to show the feasibility of embedding Lua into super loop-based embedded systems.
My Target board: STM32F415 microcontroller
STM32F415 is a 32-bit microcontroller manufactured by STMicroelectronics. This microcontroller was carefully chosen for this project since it supports a standard newlib library with ARM Cortex M4 core.
By working with a standard library, the Lua source code can be compiled on a target without complicated modifications. Furthermore, the results of this project can be easily applied to various projects since many of them support standard library support.
Additionally, this microcontroller offers 1 Megabyte of flash memory and 192 Kilobytes of RAM. In this project, the compiled binary of the Lua source code needs to reside on flash memory, and it increases the usage of the memory. 1 Megabyte of flash memory is sufficient to understand the memory impact after embedding Lua.
Compiling Lua source code
The Lua source code is written in C, and it needs to be compiled on a target so that the Lua Virtual Machine can be ported.
The Lua source code has compatibility with the C89 standard which means that, as long as the target supports the ANSI standard C library, the Lua source code can be compiled.
I used an STM32 microcontroller and it offers a library called Newlib-nano. Newlib is a lightweight version of the standard ANSI C/C++ library intended for embedded systems, and Newlib- nano is a variation of Newlib to extend supports for additional MCUs such as ARM Cortex- M based MCUs.
The below table shows the size of the Lua source code objects compiled with Newlib-nano. Please note that the Lua version I used in this work was 5.1
The total size of the entire Lua 5.1 source objects was 122.020 Kilobytes. Considering an STM32 microcontroller offers 1 Megabyte of Flash, this amount of increases after adding the Lua
There were two source files to be excluded, and they were “lua.c” and “luac.c”. These files contain the main program for the command line interpreter and the standalone bytecode compiler. They were not needed as long as the Lua is to be ported without a command shell.
Need to store Lua Scripts to execute on a target but to where?
In order to execute the Lua scripts and libraries on a target, the actual Lua scripts had to be stored on flash memory so that they can be executable on a target.
Filesystem
A filesystem in an OS Kernel facilitates for file and directory manipulation. To be specific, a filesystem manages how and where data is stored in memory so that clients such as users or applications can easily access to those data. Also, a filesystem manages the structure of files, directories and metadata, and any operations like read or write. The below figure conceptually shows that a filesystem manages the client accesses to data stored in a memory and the operations to execute read and write.
In general, OSes like Linux or Windows contain filesystems so that the clients do not consider where to store/load files and how to manage them.
For the embedded systems without an OS, the system cannot utilize a filesystem obviously since it is offered by an OS kernel. This means that the clients should manually manage the memory addresses where the files are stored.
Without a filesystem
Regardless of which filesystem, embedding a filesystem will potentially cost an extra memory and processing time overhead. For this reason, instead of using a filesystem, each Lua script was directly stored into memory for this work. The below figure demonstrates that the image containing all the Lua scripts is stored at a certain memory address. The image also contains information about the starting address of each script.
The information of the starting address was needed when each script was loaded in a C application. For instance, ‘library_1.lua’ which was stored at 0x08010000 was accessed in C as below;
The “LuaL_loadbuffer” loads a buffer as a Lua chunk without actually running it, and this Lua chunk can be executed whenever it is called in a C application. However, this solution was turned out to be extremely inefficient when there were a large number of Lua scripts to be stored because the starting addresses of each Lua script had to be tracked and managed. Also, the buffer had to be loaded as many times as the number of Lua scripts.
Lua amalgamation
In order to improve the way of storing and loading Lua scripts, all the Lua scripts can be amalgamated and concatenated into a single file. Then, the number of buffer loading operation can be reduced to only once because all the scripts are combined into one single file.
Before amalgamation, we need to understand how Lua VM looks for a module
The Lua VM, also known as an interpreter, is an engine to execute Lua bytecode. When the Lua Virtual Machine executes a Lua script, it looks for all the required Lua modules from a Lua table called “package.preload” to load modules.
Assuming the Lua VM looks for a module named as “lua_to_c”, the process will be executed in the following sequence.
First of all, the Lua VM searches this module from a “package.preload” table to see whether the module is preloaded. The Lua standard libraries are the ones that are usually preloaded so that the standard function like ‘print()’ can be used by default.
If “lua_to_c” module is not in the preloaded table, the Lua VM begins to look for the shared library folder as well as the local project folder.
Lua Amalg tool
‘Amalg’ is a tool to package Lua scripts and dependent modules into a single file by adding the dependent modules into this “package.preload” table. It is developed by P. Janda, and it is free software under the MIT license [2].
This tool takes the source file and a list of the required modules as arguments, then it produces a single output file.
The figure below demonstrates how amalgamation is done. The “lua_application.lua” required two library modules. Then, those library modules were added into the “package.preload” table. The Lua VM did not have to look into directories since the Lua VM was able to find the required modules in the preload table in this case.
After the amalgamation process, all the amalgamated files were concatenated together to produce a single Lua file. Loading this concatenated Lua file in a C application allowed to load all the necessary Lua modules at once. Also, it was possible to make one module require other modules without any issues.
Note: The Lua file was converted into SREC format to flash the target. SREC stands for S-Record which is commonly used for programming flash memory.
At this point, we have loaded Lua scripts and libraries, as well as the interpreter source codes, are all loaded into the target. We need to understand how to run the code from our C applications.
Running Lua scripts
Lua Virtual Machine Initialization
The Lua Virtual Machine executes Lua bytecode on a target, bytecode is an intermediate form of the code which can be created by compiling the Lua source code. The Lua VM has to be initialized on a target application. The below figure gives an overview of the Lua VM initialization process.
First of all, a new Lua interpreter state gets created by allocating memory utilizing the Lua C API. Then, the standard libraries as shown in the left are registered. After registering the standard libraries, the Lua strings will need to be loaded and converted to bytecode after the compilation done by the Lua compiler. Finally, this bytecode will be executed by the Lua Virtual Machine. This process needs to be implemented in the C application to initialize the Lua VM during the application startup.
Lua C API?
Lua offers simple, but very strong C API. This C API extends the capabilities of C module since it allows developers not only run Lua code and but also access Lua objects from C. Similarly, developers can access C libraries and call C functions from Lua.
Lua and C share a stack to communicate. Specifically, the Lua VM will be initialized along with the memory allocator, then the VM also allocates the stack in a heap to share the data between Lua and C. In C, these data were retrieved from the shared stack.
Saving memory
For many embedded systems, the memory size is limited. During the process when the Lua source code is loaded and compiled into bytecode, it goes through some processes like loading into memory and compiling which costs a huge consumption of memory.
In fact, the compilation process was not successful in my case when testing this on STM32. To be more descriptive, while trying to load Lua scripts on a target, an error occurred causing a fault exception due to the lack of memory. The error was seen while running the Lua parser reading input texts. It was because the Lua VM consumes RAM while parsing the source code to turn it into bytecode. Since a target had limited memory resources, the size of RAM required by the Lua parser was more than what was available.
Pre-compilation
This issue was resolved by precompiling the input scripts into bytecode before loading. It was because loading bytecode instead of the input texts skips the compilation process.
The Lua Virtual Machine is capable of determining whether the input binary is source code or bytecode by reading a header block. The header block is the first 12 bytes of the precompiled chunk. The header block contains not only the bytecode signature but also the useful Lua configurations as listed below [3];
The following hex values in the below figure were the first 12 bytes of Lua bytecode created in my work. The first 4 bytes were “0x1B4C7561” which was to show that this input chunk was Lua bytecode. “51” was to represent that this chunk was compiled on Lua version 5.1. This bytecode chunk was cross-compiled for an STM32 microcontroller based on the 32-bit processor using a little-endian format. The 7 to 11th bytes were to show this configuration. The last byte was 1 showing this chunk was precompiled for the integer only configuration.
Cross-compiler
In order to execute the pre-compilation process, Lua cross-compiler is needed. It is required to compile the binary image on a PC for a different hardware platform and make the compiled bytecode executable on ARM targets. Lua 5.1 contains a compiler called “luac”, but this compiler does not support cross-compilation. One solution to this is to compile the source and dump the compiled bytecode on a microcontroller, with the same architecture with an STM32 with a lot more RAM to be able to compile the code. Then, this compiled bytecode should be executable on a Superbean device; but, this is not efficient since it requires extra hardware and also time-consuming.
Alternatively, eLua offers a customized version of luac with cross-compilation support [4]. This cross-compiler accepts a few parameters to deal with the different target board configurations and outputs the bytecode based on them. The flag of “-ccn” is to configure the “lua_Number” type and the size which is an integer and 32 bytes, respectively. The “-cce” option can be used for configuring endianness which is set to little in this case.
After loading the precompiled bytecode on a target, the code was successfully executed and caused no memory issues.
Other than the fact that pre-compilation resolved a memory issue, there were some other benefits from it. Because the Lua compilation process was skipped, the faster loading was expected.
Also, the precompiled Lua chunk was observed to be smaller than the corresponding source file by 3.451 Kilobytes as shown in the left table in my case.
Overall
The below shows the overall build and Lua code execution process after including the pre-compilation process.
References
[1] R. Ierusalimschy, Programming In Lua, 2014, 3rd ed. chapter 4, p. 293–365.
[2] P. Janda, ‘Amalgamation of Lua Modules/Scripts’ [Online]. Available: https://github.com/siffiejoe/lua-amalg [Accessed July. 3, 2018]
[3] K, H. Man, ‘A No-Frills Introduction to Lua 5.1 VM Instructions’, March 2006, Version 0.1
[4] eLua Doc, ‘Generic info’ [Online]. Available: http://www.eluaproject.net/doc/v0.8/en_using.html [Accessed June. 29, 2018]