Machine Learning Diary 001: Swift detour, and Recreating FastAI's Collaborative Filtering
I've been spending quite a bit of my free time studying machine learning over the past 3 months. After spending quite a bit of time deciding what learning materials to start with, I decided on fast.ai's Practical Deep Learning for Coders (I saw the 2019 version). Overall I highly recommend the course, and I'm planning to go back and do the 2020 version of it.
One reason I was excited about fast.ai's course is that they had some content on Swift for TensorFlow (S4TF). I have written a lot of Python and while it has its place, I was excited to try something new and hopefully better. I watched the two fast.ai videos on S4TF and then decided to try it out. This ended up being a fairly costly mistake for me. Basically S4TF is just not nearly as mature as PyTorch, and I wasted a bunch of time struggling with environment setup (no equivalent to conda install), bugs, very incomplete documentation, and cryptic error messages. At this point (I tried in 2020-09), I suggest you steer clear of S4TF if your primary goal is just to learn ML. Though it was fun to learn some Swift and I'd definitely consider using Swift in future projects, ML or otherwise.
With that multi-week detour out of the way, I decided to focus on PyTorch. For my first task I decided to try to recreate the simplest model presented in the fast.ai course, but without using the fastai library at all, just pure PyTorch. I found that almost all of the special stuff in fastai is included in PyTorch as of 1.6, so this was a lot easier than I expected. Here's my final notebook for this collaborative filtering task. Some of the issues I ran into and how I resolved them:
- For this task, using the GPU was much slower than just using the CPU. I suppose this is because the model is so simple that the overhead of copying data to/from the GPU dominates any speedup in actual calculations. In retrospect this was also probably why training my Swift version of the model was so slow, and S4TF doesn't allow eager tensors to use the CPU if a GPU is present, so this would have been much harder to discover and work around in Swift.
- Fastai wraps the torch.nn.Embedding type with a function that changes the initial values. PyTorch initializes using random values drawn from a normal distribution with 𝜎^2 = 1. Fastai changes this to a normal distribution with 𝜎^2 = .01, truncated so no value is farther than 2 standard deviations from the mean. They reference An Exploration of Word Embedding Initialization in Deep-Learning Tasks as justification. I found this provided a small boost in performance.
- I forgot calls to squeeze() in a few places. I was quite surprised that torch.nn.MSELoss didn't throw an error when I provided it two tensors of different shapes. Instead it did some averaging over all the values, which led to poor training performance.