Most of the code I’ve ever written for Android is fast enough that speed has never been a concern. Until a few weeks ago when I was working on an app that reads and analyzes large files.
While I expect users to understand that a larger file will take longer to process, at some point they will give up on my app. Either because they think the app isn’t working, or they simply aren’t willing to wait that long. So if I can cut that time in half or less, it’s certainly worth doing.
Since I’ve been using RxJava for all of my background jobs, it made sense to try to do the same for this. Essentially I have some code that looks like this:
So, I just tried to stick each iteration of the loop into a background thread. You can see what I did below:
It did speed it up, it was running in about half the time. But, it was causing a lot of garbage collection (GC), which made the UI very stuttery and janky while it was loading. To try to understand what was happening, I added a log to print out
Thread.currentThread().getName(). This made me realize that I was actually spawning a new thread for each piece of data. As it turns out, spawning thousands of threads is not a good idea.
I’ve already accomplished my goal of speeding up the data processing, but it’s not running very smoothly. I was also wondering if it could run even faster if it didn’t cause so much GC. So instead of spawning a new thread, I created my own limited thread pool for RxJava to use:
For a dataset where each piece of data was quite large, this resulted in a further time savings of roughly 10%. However, for a dataset where each piece of data was small, it reduced the processing time by roughly 30%. It also reduced the number of GC calls, but there were still too many.
I had one more idea - what if the overhead of managing and switching threads was slowing it down? To get around this, I could group my data into one sublist per thread. That way it would still run in parallel, but the management of each thread would be minimal. I adapted the solution here to implement this idea:
Previously I mentioned testing with two datasets. The first had large individual items but fewer total items. The second had many more total items, but each item was smaller. When I ran this solution against the first dataset, the difference was negligible. But, when I ran it on the second, longer dataset, the change was dramatic. With my previous approach it took about 20 seconds to process this dataset. Now, it takes about 5 seconds.
The reason the second dataset was sped up so much, is because of the number of data items. For each item, it has to schedule a thread to do the work. Now that I’ve reduced this scheduling work, there’s very little overhead.
There’s a few places in my code where I need to do this parallel work. So I refactored this solution into a util method that can be called anywhere:
T is the type being processed, in the example that would be
DataModel. You pass in a
List<T> to be processed, and expect a result
U. In my example
List<DataModel>, but it could be anything, it doesn’t have to be a list either. The worker function being passed in is what does the processing of each sublist, and returns a result.
Can we make it faster?
There are many factors at play here that affect how fast this can run. Like how the threads are managed, how many threads are available, what kind of device the app is running on, etc. Most of these I can’t control, but there was something else I didn’t consider with my solution.
What if the data isn’t split evenly? For example, if I have 4 threads, what if every data item assigned to the 4th thread is 10 times larger than any other item? Then the 4th thread will end up running roughly 10 times longer than any other thread. In that case I won’t save much time with multithreading. The second approach I had actually solves this problem, since a thread is only assigned as it becomes available. But that solution was too slow.
I also tried to change how the data is grouped into sublists. Instead of doing it arbitrarily, I could keep track of the total size of the data in each group. Then, assign each data item to whichever list is the smallest. This way, the amount of work done by each thread is close to being equal. Unfortunately, the time it takes to do this extra setup usually costs more time than it saves.
Depending on how unevenly you expect your data to be split up, running it this way may actually be faster. But for the most part, it seems like just randomly splitting up the data is faster. Ideally if each thread could be assigned when available, with minimal overhead, it would be the most efficient solution. But, I couldn’t find a way to reduce the overhead enough to make it worth it.
So if you need to run some code in parallel, here’s one approach. Let me know what you think, there are many variables at play here. It’s difficult to determine the optimal solution, if there even is one. Also remember, just because you can run something in parallel doesn’t mean you should.