Although not the default on most non-Google brands, it is a popular app installed by foreign language users because of its good support and convenience of use particularly with dozens of Asian and Indian languages.
As a keyboard app, it monitors and analyzes your keystrokes, offering suggestions and corrections for spelling and grammar, sentence completion and even emoji suggestions.
Now for the interesting part. Since the last few versions, it also retains a lot of data (ie, user keystrokes!) in its cache. This is at least seen from the version from Jan 2020 (v 8.3.x). From a DFIR perspective, that is GOLD. For a forensic examiner, this can possibly show you data that was typed by the user on an app that is now deleted, or show messages typed that were then deleted, or messages from apps that have the disappearing message feature turned on! Or data entered into fields on web pages/online apps (that wouldn't be stored locally at all). Also for some apps that don't track when a particular item was created/modified, this could be useful.
Note - The Signal app wasn't specifically tested to see if data from that app is retained, but based on what we can see here, it seems likely those messages would end up here too. All testing was on a Pixel 3 running latest Android 11 using the default keyboard, and default settings. This was also verified on other earlier taken images. Josh Hickman's Android 10 Pixel 3 image was also used, and Josh was able to verify that Telegram and WhatsApp sent messages were present here. The specific versions of Gboard databases studied were:
- 8.3.6.250752527 (on Android 10)
- 8.8.10.277552084 (on Android 10)
- 10.0.02.338070508 (on Android 11)
Location
Gboard's app data (sandbox) folder is located here:/data/data/com.google.android.inputmethod.latin/databases/
Here you might see a number of databases that start with trainingcache*. These are the files that contain the caches.
Figure 1 - Contents of Gboard's databases folder (v 10.0.02.338070508) |
trainingcache2.db (v 10.0.02.338070508)
The table training_input_events_table contains information about the application in focus, its field name (where input was sent), the timestamp of event and a protobuf BLOB stored in _payload field, as shown in screenshot below.Figure 2 - training_input_events_table (not all columns shown) |
The highlighted entry above is from an app that was since deleted. The _payload BLOB is decoded in screenshot below, highlighting the text typed by the user in the Email input field. The protobuf has also has all of the data included in the other columns in the table.
Figure 3 - Decoded Protobuf from _payload column |
In most instances however, the protobuf looks like this - see screenshot below, where input needs to be put back together as shown.. Here you can see the words the user typed as well as suggestions offered by the app. Suggestions can be for spelling, grammar, or contact names, or something else.
Figure 4 - Decoded protobuf - reconstructing user input |
Above, you can see the words typed and suggestions offered. On an Android device, the suggestions appear as shown below while typing.
Figure 5 - Android keyboard highlighting suggested words |
trainingcache3.db (v 10.0.02.338070508)
In version 8.x, this same database is named trainingcache2.db, and follows the same exact format. The table s_table looks similar to the training_input_events_table seen earlier. However, the _payload field does not store the keystokes here.Figure 6 - s_table |
Figure 8 - tf_table entries |
Figure 9 - Reading keystroke sessions from tf_table |
We can combine (join) this data with the one from s_table to recreate the same data as we got from training_input_events_table earlier.
Figure 10 - joined tables |
Figure 11 - ALEAPP output showing trainingcache parsed output |