21 March 2016
Introducing the UTF-8 string plugin
One potential challenge for app developers is “localizing” their apps internationally. Perhaps it’s a simple matter of working with someone’s name which has non-ASCII characters in it, or more comprehensive support for languages like Japanese which uses a unique character set.
The core issue for Corona developers is that Lua’s string functions consider a string as merely a series of bytes, and it’s unaware of multi-byte characters. This causes problems if you want to know how many characters are in a string containing multi-byte characters because, in this case, Lua will simply count the bytes. Even more issues can arise if you need to gather sub-strings or convert a non-ASCII string to uppercase or lowercase. All in all, this can be considerably frustrating to developers who build apps for worldwide distribution.
The solution?
Recently, Corona Labs released the UTF-8 plugin to help alleviate these issues. The plugin’s functions closely mirror the existing string library functions, except that the UTF-8 equivalents handle multi-byte strings. For instance, just as you might use string.match(), on a normal ASCII string, you can substitute utf8.match()
for non-ASCII strings.
This plugin also introduces several new functions for advanced purposes, helping you deal with character positions, offsets, code points, sub-string insertion/removal, and more. You can see a complete list and usage details in our documentation.
Usage
Like all Corona plugins, you need to include it in your build.settings
file:
1 2 3 4 5 6 7 8 9 10 |
settings = { plugins = { ["plugin.utf8"] = { publisherId = "com.coronalabs" }, }, } |
Then, in any module where you want to use UTF-8 functions, simply require()
the plugin as usual:
1 2 |
local utf8 = require( "plugin.utf8" ) |
After that, simply call string.[methodName]
functions as utf8.[methodName]
instead. For example, notice the output values of the respective “length” functions (string.len()
and utf8.len()
) on an identical string:
1 2 3 4 |
-- The Russian alphabet print( string.len( "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ" ) ) --prints 64 print( utf8.len( "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ" ) ) --prints 32 |
Conclusion
As you can see, the UTF-8 plugin offers some valuable new capabilities to Corona developers, in particular those developing apps for international distribution. To learn more, please reference the documentation or discuss this plugin in the Corona forums.
RuneW
Posted at 06:54h, 22 MarchThanks!
madclown
Posted at 07:21h, 11 MayThat’s a great and much needed plugin.
Andrzej Futuretro
Posted at 09:31h, 16 JuneLate to the party here but wanted to show my appreciation for this great plugin!