Forem: Zac J.Q. Yap

Reuse Data for Training During Upstream CPU Bottlenecks

Zac J.Q. Yap — Tue, 26 May 2020 13:33:11 +0000

Upstream operations (eg. Disk I/O and data preprocessing) in the neural network training pipeline do not run on hardware accelerators.

“Data echoing” reuses intermediate outputs from earlier pipeline stages when the training pipeline has an upstream bottleneck. This maximises hardware utilisation. The number of times data is reused is set as the echoing factor. The effectiveness of this approach challenges the idea that use of repeated data for SGD updates is useless or even harmful.

Echoing can be done:

Before batching – data is repeated and shuffled at the training example level, increasing the likelihood that nearby batches will be different. This has the risk of duplicating examples within a batch.
After batching
Before augmentation – allows repeated data to be transformed differently, potentially making repeated data more akin to fresh data
After augmentation – other methods like dropout that add noise during the SGD update can make repeated data appear different

Data echoing reduces the number of fresh examples required for training and the training time, without harming predictive performance (up to a upper bound on the echoing factor). There is also empirical evidence of data echoing performing better with larger batch sizes and more shuffling.

Source paper: https://arxiv.org/pdf/1907.05550.pdf

Liked this post?
This summary first appeared in the Pragmatic CS newsletter. Subscribers got it first!

Interview With Alex Pareto from Brex – Scaling, Thundering Herd Problem, Dev Tools, Elixir and Functional Programming

Zac J.Q. Yap — Mon, 18 May 2020 05:35:46 +0000

Alex is the former Head of Engineering at video-first e-commerce app NTWRK and co-founder of a YCombinator-backed startup.

Who are you and what’s your backstory?

Hey, I’m Alex! I work on engineering at Brex. We’re reinventing financial systems to help ambitious companies scale. I work on building out the Card product.

Before Brex, I led the engineering team at NTWRK, a video first e-commerce app. NTWRK releases new goods in collaboration with popular brands and celebrities. During my time there we worked with Drake, Lebron James, Nike, and many others. We often dealt with interesting technical challenges around scaling and thundering herds.

Earlier than NTWRK, I co-founded Demeanor.co, a Y Combinator backed startup. Demeanor was a platform for celebrities and internet creators to create custom merchandise. During my time at Demeanor, I focused a lot on how to build software in a way that let us iterate and ship quickly.

In my past, I also have spent time working on software at Facebook and a few small startups. Back at USC, I started a CS organization for making things with other students called Scope.

I write (occasionally) at alexpareto.com, about my thoughts on software and startups.

You mentioned scaling and thundering herds challenges back during your time at NTWRK. Can you talk us through how you dealt with it?

Definitely! At NTWRK every show is live. We sent a push notification to interested users just before the show starts and everyone watches the show concurrently. This means the traffic patterns are very spiky - traffic can increase by 100x in a matter of seconds. At that rate of increase autoscaling can't keep up.

Addressing spiky traffic takes a few steps. The first is to cache liberally (preferably on the edge using a CDN like Cloudflare or Cloudfront). This helps ease the load on the servers and database quite a bit.

A massive amount of users online concurrently means that every time the cache expires, a host of requests cascade onto the servers requesting the same data.

All these requests hitting the database can cause a big slow down. The trick here is to note that all the requests are asking for the same data. So instead of letting all the requests cascade to the database to recalculate the same data thousands of times - the solution is to recalculate the data once and share it with all the requests.

Redis works well for this - to ensure that only one request goes to the database when a complex query needs to be recalculated and that all other requests read the result from Redis.

If you're interested, this article on High Scalability does a nice job describing various caching techniques and this video from Facebook does a nice job of describing the thundering herd problem.

Editor’s Note: Alex also wrote an extremely helpful post for scaling stage-by-stage up to 100k users.

You said that back at Demeanor you focused on building software in a way that lets you iterate and ship quickly. What’s the difference between that and the approach you have to take now, and the implications of the different approaches?

There are a few different stages of companies, and at each stage shipping fast and iterating is important. With that said, there are different goals at different stages.

An early stage, zero to one company like Demeanor is about iterating fast to find product market fit and make something people want. We sacrificed certain things like stability and broad feature sets to get something into people's hands quickly. For example - during the early stages of the company we had no staging environment, feature flags, or QA testing.

Growth stage companies like NTWRK operate at a larger scale - both in users and engineers. Shipping fast is important, but so is stability and maintainability. This means having feature flags, interfaces for third party libraries, proper integration + unit testing, and other practices. I think these practices take a bit more time up front, but help teams ship faster once the team size grows. New engineers can onboard faster, people have confidence to ship changes, libraries can be swapped out fast, and new features can be added quickly.

Different tools and techniques for different stages and goals!

What are you excited about right now in CS?

I’m really excited about the advancements in developer tooling made in recent years. There are some exciting companies in the space making it much easier to ship very scalable software. I’ve used both Vercel and Render in the past few months - they are a blast to work with.

Advancements like these lower the barrier to entry for people to start or build something new. With less capital needed - and no need to learn the nuances of AWS or a linux box off the bat - we’ll start to see a lot more startups and products launch in the coming years.

What was the last thing you learned?

The most recent thing I’ve learned about is functional programming. We use Elixir at Brex which meant I had to break some old object-oriented instincts when I joined. Lots of the paradigms around object-oriented programming get thrown out the window with functional programming.

The key thing to keep in mind is that functional programming is all about taking data and transforming it.
If you’re interested, Elixir is a fantastic language to get started with functional programming. Programming Elixir by Dave Thomas is a great introduction to the language.

Editor’s Note: Check out Why Brex Chose Elixir and also How Discord Scaled Elixir to 5 million Concurrent Users

For me, the quickest way to learn something is by doing (and deep work!). Reading a book like Programming Elixir is great for getting a background. After establishing a background, diving in and writing code forces one to start thinking and solving problems in a "functional" way. I would encourage anyone interested to build a side project (even if it's for personal use) in a functional language like Elixir to get started with it.

Who inspires you as a software engineer? Why?

I’m a big fan of Cal Newport’s writing on deep work (he also happens to be a CS professor).

Writing great software is a craft that requires intense focus.
In his writing, Cal talks a lot about how to cut distractions to open up periods of time for this intense focus. He argues that these periods of time are where we'll do our best work.

I've found this to be very true in practice - even outside of software engineering. Whether writing strategy plans or writing software, the way to do the task well is to set aside some time to focus on that and only that. I block out an hour or two each day to focus - and turn off Slack, messages, notifications, and e-mail for the whole block of time.

Over time, I've noticed this is a common habit among many people I know to get more high quality work done faster. If you're interested, Deep Work by Cal Newport is a great place to start!

--
Liked this interview? Check out more interviews with CTOs and Tech Leads, along with actionable CS research and software engineering best practices at my newsletter: pragmaticcs.substack.com

Scaling your web app up to 100K users and beyond

Zac J.Q. Yap — Sun, 17 May 2020 08:50:47 +0000

Khan Academy scaled to 2.5x their web traffic in a week by using serverless architecture and CDN caching of all static data. They also extensively cached common queries, user preferences and session data.

You can also adopt a bottleneck-centric approach to scaling, by checking your resource monitoring to identify the bottlenecks. It is usually the database first. But bottlenecks can be memory, CPU, Network I/O or Disk I/O.

As a principle, make the web stack do less work for the most common requests .

Some ideas:

Cache database queries
Index the database
Move session storage to an in-memory caching tool
HTML fragment caching
Use queues and more workers
Use HTTP caching headers
Add a Content Delivery Network in front of a static file host

Here's a guide for scaling up to 11M users, stage by stage:

Use vertical scaling early on but it has no failover or redundancy.
At >1000: Add availability zones, load balancers, and slave database to RDS.

At 10K-100Ks: Horizontal scaling of instances. Move static content to S3 and even some dynamic content to the Cloudfront CDN. Add more read replicas of the database to RDS. Shift session state off your web tier and store session state in ElasticCache or DynamoDB

At >500K: Add automation tools and decouple infrastructure. Add monitoring, metrics and logging.

At >10M: Use federation, sharding and explore other types of DBs

--
Found this helpful? I write a weekly newsletter on actionable CS research and software engineering best practices.

Cheers,
Zac

Google's guide and chart for where to implement application logic and rendering

Zac J.Q. Yap — Sat, 16 May 2020 15:40:33 +0000

Source: https://developers.google.com/web/updates/2019/02/rendering-on-the-web

Found this chart from a February 2019 Google Developer update which was very helpful for me in making architecture decisions and choosing frameworks to use (React frameworks like Next.js, Gatsby etc.) – instead of just going with the most heavily-marketed and hyped ones!

Note the tradeoffs between performance, SEO and overhead costs. Personally though, I don't think the Time To First Byte (TTFB) performance measure they use is that significant a consideration to take into account.

“Trisomorphic” rendering which was mentioned in the article, and not discussed as widely, seemed promising:
Use streaming server rendering for initial/non-JS navigations. Then have your service worker take on rendering of HTML for navigations. This keeps cached components and templates up to date. Enables SPA-style navigations to render new views in the same session. Works best when you share the same templating and routing code between the server, client page, and service worker.

If you are doing client-side rendering, make sure you implement aggressive code splitting and lazy-load Javascript!

--
If you found this helpful, I run a newsletter featuring more of such content at: https://pragmaticcs.substack.com/

Cheers,
Zac