Tuesday, April 19, 2016

Beware of SQLite and the Turkish Locale

Today I came across a truly puzzling issue on Android. An otherwise tried and tested application crashed consistently when running on a device using Turkish as the current locale.

App crashing on a Turkish device

The problem occurs in a large app developed for a customer, inside a proprietary binary component, so no direct debugging was possible. All I had available was a vague stack trace showing the root problem to be a NullPointerException from trying to parse an integer:

Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'int java.lang.Integer.intValue()' on a null object reference

Not much to go on, the goal was now to figure out just what could cause such a problem just from running on a device using Turkish Locale.



Having spent some time on Google, searching for other people's trouble with the Turkish Locale, it became clear that there are issues with lowercase Strings in Turkish. I started working on a small isolated app which would aid in narrowing down the exact issue. It wasn't long until the suspicion I had turned out to be spot on.

SQLite with Turkish Locale

The problem was caused by relying on the device Locale for representing SQL Strings and the fact that the SQLite parser must be converting the SQL literals to uppercase. That becomes a problem, because in Turkish the lower case "i" becomes "İ" in uppercase. Inside SQLite, the parser is therefore seeing "İNSERT" rather than "INSERT" which is of course not valid SQL. This is actually specified in the Android documentation for the Locale class and the subject was also covered by none other than Jeff Atwood some years ago.

To reproduce the problem, create a new project in Android Studio with an empty activity. Place the following XML for the layout:

<?xml version="1.0" encoding="utf-8"?> 
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" 
    android:layout_width="match_parent" 
    android:layout_height="match_parent"> 
 
    <Button 
        android:layout_width="300dp" 
        android:layout_height="48dp" 
        android:text="SQL insert in lowercase" 
        android:gravity="center_horizontal" 
        android:onClick="insert"/> 
 
</RelativeLayout> 


...and add the following Java code to the activity:
public class MainActivity extends AppCompatActivity { 
 
    private final static String DML_CREATE = "CREATE TABLE IF NOT EXISTS demotable(Name VARCHAR);"; 
    private final static String SQL_INSERT = "insert into demotable values('John Doe');"; 
 
@Override
protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); try(SQLiteDatabase demoDatabase = openOrCreateDatabase("demotable",MODE_PRIVATE,null)){ demoDatabase.execSQL(DML_CREATE); } setContentView(R.layout.activity_main); }
public void insert(View view){ try(SQLiteDatabase demoDatabase = openOrCreateDatabase("demotable",MODE_PRIVATE,null)){ demoDatabase.execSQL(SQL_INSERT); } } }

Although being perfectly valid Java and SQL, the code above will crash consistently when running on a device with a Turkish Locale!

Caused by: android.database.sqlite.SQLiteException: near "İNSERT": syntax error (code 1): , while compiling: İNSERT İNTO DEMOTABLE VALUES('JOHN DOE');

The stack trace makes it blatantly obvious what happens. If only I had such a nice stack trace originally I could've saved a few hours, but such is life when you depend on proprietary closed source components. :(

Solutions

There are two typical proposals for how to solve the issue. This can either be done by always converting SQL Strings to upper-case:
 
 
public void insertUppercase(View view){ try(SQLiteDatabase demoDatabase = openOrCreateDatabase("demotable",MODE_PRIVATE,null)){ demoDatabase.execSQL(SQL_INSERT.toUpperCase()); } }

...or (better in my opinion) make sure SQL Strings are always interpreted using the root locale by converting to lower-case while providing the neutral root locale:
 
 
public void insertLowercaseEnglish(View view){ try(SQLiteDatabase demoDatabase = openOrCreateDatabase("demotable",MODE_PRIVATE,null)){ demoDatabase.execSQL(SQL_INSERT.toLowerCase(Locale.ROOT)); } }

Although I have not tested it, it's also likely that a solution could be to externalize the SQL as a simpler character set (ASCII) rendering the the issue moot. However, most developers will probably just embed SQL in Java unicode Strings. Feel free to comment on this if you have other solutions.

Lesson learned

Something tells me this is a very common bug out there, the intricate details of "i" in the Turkish Locale are probably only known to Turks. The only reason why more Java developers don't run into this, is probably because they typically operate and control the host runtime. However, when dealing with an app, the environment is not as neatly encapsulated and testable so this problem bubbles up to the surface and hits you hard.

The lesson learned must be to never rely on the default Locale, not even for something as innocent as Java Strings. While we are able to remedy the issue when using SQL as an embedded DSL, one can easily imagine the same bug sneaking in countless other ways. For example, annotations come to mind, where you do not have the luxury of being able to manipulate the String before its sent to be parsed by some annotation processor!

Post a Comment