Disk caching on Android

Frequently when writing Android apps I’ll find the need for a cache. The classic case is with Bitmaps- adding all your bitmaps to a cache and grabbing them solely from the cache can solve a lot of OOM issues. And Google has written a great class, LRUCache, to easily enable memory caching. The problem is that memory caching frequently isn’t enough. If I have an app that needs to download lots of images from the net, I may need tens or hundreds of MB of images, and I don’t want to redownload them every time the app restarts or it falls out of a small memory cache. At the same time I don’t want to use up exponentially increasing amounts of disk space. The solution is to use a disk cache.

Unfortunately Android doesn’t have built in disk cache support. It does have a cache directory, but there’s no control over it and Android itself warns you in the documentation to delete files on your own and not rely on it. So we need to write one ourself. The goal here is to write a class that allows caching of arbitrary data types to disk, automatic reading in of those types with a simple interface, persistence across shutdown, and decent performance.

More or less, we want an interface that supports as a minimum this:

package com.gabesechan.android.reusable.cache;

public interface Cache<KeyType, ValueType> {
  ValueType get(KeyType key);
  boolean put(KeyType key, ValueType value);
  void remove(KeyType key);
  void clear();
}

The first major question is data structures and performance. We want to be able to do a quick lookup of key->value. That screams hash table. It even has the nice advantage that you would be easily able to find the files if you named it based on its hash. For a lookup that works perfectly. In fact you could even avoid having a hash map at all- you could just use the hash as a filename and see if a file exists. Of course we don’t want to do that because we don’t want to perform all those disk accesses- we’d rather keep a list of all the Files that actually exist in a hash.

The problem with a hash is that when we add new data to the cache, we may go over the size of the cache. At that point we need to kick items out, and we would prefer to do it on an LRU basis. An LRU list screams queue (and particularly a linked list implementation of a queue)- add new items to the tail, remove from head. Whenever an item is accessed via get, remove it from the list and put it back at the tail. The problem is that if you remove an item from the cache, a queue requires an n time traversal. Remove should be the least frequent operation, but I’d really prefer not to take all that time, especially if the cache is a lot of small files.

There actually is a trick around this. What if instead of hashing the values directly, you hashed an object that holds the value and 2 pointers- a next and a previous pointer. Basically what if you hashed nodes of a linked list? You could then have O(1) lookup time to see if any object is in the list, provided you’re passed in a key. You’d just need to keep the hash map of list nodes, a pointer to the head, and a pointer to the tail and all of your operations can be O(1). So feeling proud I remembered an old trick I hadn’t used in over a decade I went ahead and implemented that. Then later that day I was poking around on the internet and was introduced to LinkedHashMap- a built in Java class that implements exactly that. Its usually keeps the list in insertion order, but it has a mode to keep it in access order- basically for this exact usecase. So I got to throw out an hour or two of work and testing and replaced it with the built in class.

The next major question is what to cache. We can’t cache ValueType objects- the entire point is not to use up all that RAM. As I mentioned before we don’t really need to cache anything- we know the name of the file (the hash value of the key), so we can always create a new File object. But as a micro-optimization we’re going to cache a File object to each item, so we don’t need to recreate it every time get() is called. We’ll also cache any basic info that would be useful down the line- the file length, the last access time, the hash (for debugging purposes mainly). Our cache entries will look like this:

    private class CacheEntry {
        public File file;
        public long sizeBytes;
        public int hash;
        public long modTime;
        public CacheEntry(File cacheFile, long bytes, int valueHash, long time) {
            file = cacheFile;
            sizeBytes = bytes;
            hash = valueHash;
            modTime = time;
        }
    }

The third major issue is serialization. We have these in memory objects and we want to add them to our cache. In order to do that, we need to be able to read and write them to disk. Unfortunately, objects don’t have a toFile function. We could put a restriction on ValueType that it needs to implement some interface like Serializable or our own custom interface, but really we want to be able to save built in objects like Bitmap. So instead we’ll go the route of a Policy object- the cache will take a policy that will define any special operations needed for a particular ValueType- writing to disk, reading from disk, and size calculation. The user will have to define it for whatever value type they need.

package com.gabesechan.android.reusable.cache;

import java.io.File;
import java.io.IOException;


public interface CachePolicy {
    boolean write(File outputFile, ValueType value) throws IOException;
    ValueType read(File inputFile) throws IOException;
    long size(ValueType value);

}

For an example, here’s a cache policy for a Bitmap:

package com.gabesechan.android.reusable.cache;

import android.graphics.Bitmap;
import android.graphics.BitmapFactory;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;


public class BitmapCachePolicy implements CachePolicy{
    private Bitmap.CompressFormat mFormat;
    private int mQuality;

    public BitmapCachePolicy() {
        mFormat = Bitmap.CompressFormat.PNG;
        mQuality = 100;
    }

    /* Takes the quality and format to save to disk with.  Format is a constant in the bitmap class,
       quality is 0-100.
     */
    public BitmapCachePolicy(Bitmap.CompressFormat format, int quality){
        mFormat = format;
        mQuality = quality;
    }

    @Override
    public boolean write(File outputFile, Bitmap value) throws IOException{
        boolean success = false;
        FileOutputStream out = null;
        try {
            out = new FileOutputStream(outputFile);
            value.compress(mFormat, mQuality, out);
            success = true;
        }
        finally {
            if (out != null) {
                out.close();
            }
        }
        return success;
    }

    @Override
    public Bitmap read(File inputFile) throws IOException{
        return BitmapFactory.decodeFile(inputFile.getPath());
    }

    @Override
    public long size(Bitmap value) {
        return value.getByteCount();
    }


}

The next challenge is how to make the cache persistent. Since the filenames are the hash of the key, recreating the cache is as simple as listing the files in the directory. What’s lost is the LRU data. We could write that to disk each time its updated, but doing that would be a performance nightmare. We could do it every N times or on a timer, but that still would lead to unpredictable performance spikes. Instead I went another route- I decided since this is a cache and we’re willing to lose data when we fill it up, that its not the end of the world to sometimes have to redownload the data. So I didn’t try. Instead I take a best guess at the LRU data based on the following theory- if a file is still in the cache and really old, we’re probably using it a lot. So I sort the files by reverse order of file modification time, and figure worse case there will be some thrash early on.

The final challenge is timing out of the cache. Generally you don’t want to cache files forever. Data may need to be updated periodically. What I don’t want is to have a timer and need to lock the cache, iterate through it, and kick out anything too new. Instead I perform a lazy kick out. The cache is given a max save time. When get is called, the file’s modification time is compared to the current time. If its too old, we kick it out of the cache and return null, just like a cache miss.

Here’s the final class:

package com.gabesechan.android.reusable.cache;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;


public class DiskCache implements  Cache{
    public static final long FOREVER = -1;

    /*
     * CacheEntry is meant to hold info about the cached data for fast lookup.  
     */

    private class CacheEntry {
        public File file;
        public long sizeBytes;
        public int hash;
        public long modTime;
        public CacheEntry(File cacheFile, long bytes, int valueHash, long time) {
            file = cacheFile;
            sizeBytes = bytes;
            hash = valueHash;
            modTime = time;
        }
    }

    private long mMaxSize;
    private File mCacheDirectory;
    private CachePolicy mDiskCachePolicy;
    private long mCurrentSize;
    private LinkedHashMap mFilesInCache;
    private long mMaxCacheTime;

    public DiskCache(CachePolicy policy, long maxSizeBytes, long maxCacheTime, File cacheDir) {
        mMaxSize = maxSizeBytes;
        mCacheDirectory = cacheDir;
        mDiskCachePolicy = policy;
        mMaxCacheTime = maxCacheTime;
        initializeFromDisk();
    }

    private boolean isFileStillValid(long modTime) {
        return mMaxCacheTime == FOREVER || System.currentTimeMillis() - mMaxCacheTime < modTime;
    }

    private void initializeFromDisk() {
        mCacheDirectory.mkdirs();
        File files[] = mCacheDirectory.listFiles();
        mCurrentSize = 0;
        mFilesInCache = new LinkedHashMap<>(16, .75f, true); //Create the hash map in access order
        List allFiles = new ArrayList<>();
        //Store all the files in a list, then sort them on reverse modification time.  The idea is
        //that an older file still in the cache either is so tiny it doesn't matter or is used a lot
        for(File file : files) {
            if(!file.isDirectory() && isFileStillValid(file.lastModified())) {
                long length = file.length();
                int hashedValue = Integer.valueOf(file.getName());
                CacheEntry entry = new CacheEntry(file, length, hashedValue, file.lastModified());
                allFiles.add(entry);
            }
        }
        Collections.sort(allFiles, new Comparator() {
            @Override
            public int compare(CacheEntry lhs, CacheEntry rhs) {
                //We can't just return a cast of diff because of overflow.
                //We want to sort new to old, so that old files are added last to the hash and are
                //treated as MRU
                long diff = rhs.file.lastModified() - lhs.file.lastModified();
                if (diff > 0) {
                    return 1;
                } else if (diff < 0) {
                    return -1;
                }
                return 0;
            }
        });
        for(CacheEntry entry : allFiles) {
            addEntryToHash(entry);
        }
    }

    private void removeFromHash(CacheEntry entry) {
        entry.file.delete();
        mFilesInCache.remove(entry.hash);
        mCurrentSize -= entry.sizeBytes;
    }

    @Override
    public synchronized ValueType get(KeyType key) {
        int hash = key.hashCode();
        CacheEntry cachedData = mFilesInCache.get(hash);
        if( cachedData != null) {
            if(isFileStillValid(cachedData.modTime)) {
                try {
                    return mDiskCachePolicy.read(cachedData.file);
                } catch (IOException ex) {
                    //If we can't read the file, lets pretend it wasn't there
                    return null;
                }
            }
            else {
                //Not valid kick it from cache
                removeFromHash(cachedData);
            }
        }
        return null;
    }

    @Override
    public synchronized boolean put(KeyType key, ValueType value) {
        long size = mDiskCachePolicy.size(value);
        int hash = key.hashCode();

        //If the object is already cached, remove it
        CacheEntry cachedData = mFilesInCache.get(hash);
        if( cachedData != null) {
            removeFromHash(cachedData);
        }
        if(!ensureFileCanFit(size)) {
            return false;
        }

        //Perform the actual add
        File outputFile = new File(mCacheDirectory, Integer.toString(hash));
        boolean success = true;
        try {
            //Write the file.  If we can't, tell them we couldn't cache it.
            mDiskCachePolicy.write(outputFile, value);
        }
        catch(IOException ex) {
            return false;
        }
        cachedData = new CacheEntry(outputFile, size, hash, System.currentTimeMillis());
        addEntryToHash(cachedData);
        return true;
    }

    private boolean ensureFileCanFit(long newItemSize) {
        //If it can't ever fit, short circuit
        if(newItemSize > mMaxSize){
            return false;
        }
        //remove enough object so it will fit
        while(mCurrentSize + newItemSize > mMaxSize && mFilesInCache.size() > 0) {
            //Keep removing the head until we have room.
            Iterator it = mFilesInCache.values().iterator();
            CacheEntry entry = it.next();
            removeFromHash(entry);
        }
        return true;
    }

    private void addEntryToHash(CacheEntry entry) {
        mFilesInCache.put(entry.hash, entry);
        mCurrentSize += entry.sizeBytes;
    }

    // Sometimes you download a file and want to add it to the cache, but not immediately use it
    // This function allows you to do so without round tripping through memory
    public synchronized boolean put(KeyType key, File origFile) {
        long size = origFile.length();
        int hash = key.hashCode();

        //If the object is already cached, remove it
        CacheEntry cachedData = mFilesInCache.get(hash);
        if( cachedData != null) {
            removeFromHash(cachedData);
        }
        if(!ensureFileCanFit(size)) {
            return false;
        }
        File outputFile = new File(mCacheDirectory, Integer.toString(hash));
        if(!copy(origFile, outputFile)) {
            return false;
        }
        cachedData = new CacheEntry(outputFile, size, hash, System.currentTimeMillis());
        addEntryToHash(cachedData);
        return true;
    }

    private boolean copy(File src, File dst) {
        boolean success = false;
        InputStream in = null;
        OutputStream out = null;
        try {
            in = new FileInputStream(src);
            out = new FileOutputStream(dst);

            // Transfer bytes from in to out
            byte[] buf = new byte[1024];
            int len;
            while ((len = in.read(buf)) > 0) {
                out.write(buf, 0, len);
            }
            success = true;
        }
        catch (IOException ex) {

        }
        finally {
            try {
                if (in != null) {
                    in.close();
                }
                if (out != null) {
                    out.close();
                }
            }
            catch(IOException ex){}
        }
        return success;
    }

    @Override
    public synchronized void remove(KeyType key) {
        int hash = key.hashCode();
        CacheEntry cachedData = mFilesInCache.get(hash);
        if( cachedData != null) {
            removeFromHash(cachedData);
        }
    }

    @Override
    public synchronized void clear() {
        Iterator it = mFilesInCache.entrySet().iterator();
        while (it.hasNext()) {
            Map.Entry entry = (Map.Entry)it.next();
            CacheEntry cachedData = (CacheEntry)entry.getValue();
            cachedData.file.delete();
        }
        mFilesInCache.clear();
        mCurrentSize = 0;
    }

}

This Post Has Been Viewed 2,861 Times

Leave a Reply