Forem: David Haley

Auto-loading .nvmrc in JetBrains Junie terminal

David Haley — Tue, 23 Dec 2025 06:06:59 +0000

I've been using JetBrains's Junie product for agentic AI coding since it went public in April 2025. It's been an overall great experience.

I noted it was struggling with a React/TypeScript project. When running yarn commands it wasn't honoring the .nvmrc file, so it didn't load the correct node version. This caused issues for files generated/built against different NodeJS versions.

My setup is slightly strange: because NVM takes a while to load, and I'm impatient, I don't load it automatically in my shell. Whereas the usual setup would be,

[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"

my .zsh-local-mac (included from .zshrc) sets up an alias:

alias load-nvm='[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"'

Without that line, Junie doesn't have NVM enabled in its terminal. But just loading NVM isn't enough, we need to also load the .nvmrc file.

Junie helpfully sets the $TERMINAL_EMULATOR to JetBrains-JediTerm. So I added this to my shell config:

if [[ "$TERMINAL_EMULATOR" == "JetBrains-JediTerm" ]]; then
  load-nvm

  autoload -U add-zsh-hook
  load-nvmrc() {
    local node_version="$(nvm version)"
    local nvmrc_path="$(nvm_find_nvmrc)"

    if [ -n "$nvmrc_path" ]; then
      local nvmrc_node_version=$(nvm version "$(cat "${nvmrc_path}")")

      if [ "$nvmrc_node_version" = "N/A" ]; then
        nvm install
      elif [ "$nvmrc_node_version" != "$node_version" ]; then
        nvm use
      fi
    elif [ "$node_version" != "$(nvm version default)" ]; then
      echo "Reverting to nvm default version"
      nvm use default
    fi
  }
  add-zsh-hook chpwd load-nvmrc
  load-nvmrc
fi

After making this change, Junie terminals honor .nvmrc files. This is vital to run tests, e.g., yarn jest src/file.test.tsx, allowing the agent to iterate more intelligently on its progress.

Deploying to Firebase Hosting + Firestore from GitHub actions

David Haley — Mon, 06 Oct 2025 03:26:15 +0000

I recently set up a GitHub action to deploy Firebase after pull request merge. It's a tremendous time-saver. Previously, I was deploying from my dev machine, doing some toil to switch between a development environment (emulators) and the production environment.

Project setup

I use environment variables to control the various Firebase variables (project/app ID, API key, etc). Note that the Firebase API key is not a secret key. I put these in .envrc on my local machine but the action needs a bit more help setting up the environment.

I have a script that uses jq to create a JSON file from a template. For example, write-config.sh

#!/bin/bash

# Write the overall firebase config:

jq -n \
  --arg FIREBASE_PROJECT_ID "$FIREBASE_PROJECT_ID" \
  -f .firebaserc.jq \
  > .firebaserc

# Write the json file loaded by the kotlin-angular build:

jq -n \
  --arg FIREBASE_PROJECT_ID "$FIREBASE_PROJECT_ID" \
  --arg FIREBASE_APP_ID "$FIREBASE_APP_ID" \
  --arg FIREBASE_STORAGE_BUCKET "$FIREBASE_STORAGE_BUCKET" \
  --arg FIREBASE_API_KEY "$FIREBASE_API_KEY" \
  --arg FIREBASE_AUTH_DOMAIN "$FIREBASE_AUTH_DOMAIN" \
  --arg FIREBASE_MESSAGING_SENDER_ID "$FIREBASE_MESSAGING_SENDER_ID" \
  --arg FIREBASE_USE_EMULATORS "$FIREBASE_USE_EMULATORS" \
  -f webApp/src/jsMain/resources/firebase-config.json.jq \
  > webApp/src/jsMain/resources/firebase-config.json

The template files are quite simple, here's one for firebase.json (used by the CLI):

{
  "projects": {
    "default": "\($FIREBASE_PROJECT_ID)"
  }
}

You also need your Firebase environment configured in the client. This particular project builds Angular via gradle (it's a long story, see also Kotlin in the Browser). But I used the same json format that Angular Fire recommends. Here's the firebase-config.json.jq template:

{
  "projectId": "\($FIREBASE_PROJECT_ID)",
  "appId": "\($FIREBASE_APP_ID)",
  "apiKey": "\($FIREBASE_API_KEY)",
  "authDomain": "\($FIREBASE_AUTH_DOMAIN)",
  "storageBucket": "\($FIREBASE_STORAGE_BUCKET)",
  "messagingSenderId": "\($FIREBASE_MESSAGING_SENDER_ID)",
  "useEmulators": "\($FIREBASE_USE_EMULATORS)"
}

Repository setup

Set up a target environment (eg "Production") and populate it with values from the Firebase console:

FIREBASE_PROJECT_ID
FIREBASE_APP_ID
FIREBASE_STORAGE_BUCKET
FIREBASE_API_KEY
FIREBASE_AUTH_DOMAIN
FIREBASE_MESSAGING_SENDER_ID

Also, create a secret named FIREBASE_SERVICE_ACCOUNT_BASE64 containing a newly exported json service account key (see below).

GitHub action

The action is a straightforward series of commands,

check out the repo
Generate config (as above)
Install Firebase CLI
Build the app with gradle
Deploy web app to Hosting
Deploy Firestore rules
Clean up

Services deployed

The following Firebase services are deployed:

Hosting
- Provides the main web app
Firestore (rules)
- Defines the database security rules

Service account & permissions required

Create a new service account in the GCP IAM console panel. Call it something like "GitHub deploy", and only use it for GitHub action deploys.

The permissions are a bit trickier. The easy way out is to make the service account an overall admin, but consider following the principle of least privilege. Limit the impact of a malicious or mistaken actor.

Through trial and error I think this is it:

Firebase Hosting Admin
- Needed to deploy to Hosting
Firebase Rules Admin
- Needed to deploy Rules (for Firestore)
Service Account User
- Needed to act as the service account
Service Usage Consumer
- Needed to test if APIs are active

Full YAML source

name: Deploy to Firebase on merge
on:
  push:
    branches:
      - main
jobs:
  build_and_deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - name: Use Node.js
        uses: actions/setup-node@v4
        with:
          node-version-file: './.nvmrc'
      - name: Generate .firebaserc
        run: |
          ./write-firebase-config.sh
          cat .firebaserc
        env:
          FIREBASE_PROJECT_ID: ${{ vars.FIREBASE_PROJECT_ID }}
          FIREBASE_APP_ID: ${{ vars.FIREBASE_APP_ID }}
          FIREBASE_STORAGE_BUCKET: ${{ vars.FIREBASE_STORAGE_BUCKET }}
          FIREBASE_API_KEY: ${{ vars.FIREBASE_API_KEY }}
          FIREBASE_AUTH_DOMAIN: ${{ vars.FIREBASE_AUTH_DOMAIN }}
          FIREBASE_MESSAGING_SENDER_ID: ${{ vars.FIREBASE_MESSAGING_SENDER_ID }}
          FIREBASE_USE_EMULATORS: false
      - name: Install Firebase CLI
        run: |
          npm install -g firebase-tools
          firebase --version
      - name: Build Angular app
        run: |
          ./gradlew webApp:buildProductionWebApp
      - name: Deploy hosting
        run: |
          echo "${{ secrets.FIREBASE_SERVICE_ACCOUNT_BASE64 }}" | base64 --decode > "google-application-credentials.json"
          firebase deploy --only hosting --non-interactive
          rm -rf "google-application-credentials.json"
        env:
          GOOGLE_APPLICATION_CREDENTIALS: "google-application-credentials.json"
      - name: Deploy Firestore rules
        run: |
          echo "${{ secrets.FIREBASE_SERVICE_ACCOUNT_BASE64 }}" | base64 --decode > "google-application-credentials.json"
          firebase deploy --only firestore:rules --non-interactive
          rm -rf "google-application-credentials.json"
        env:
          GOOGLE_APPLICATION_CREDENTIALS: "google-application-credentials.json"
      - name: Cleanup credentials
        if: always()
        run: |
          rm -rf "google-application-credentials.json"

Happy Firebaseing!!

Quiz: Ruby & Rspec scoping

David Haley — Fri, 25 Jul 2025 19:51:32 +0000

Consider this test:

      context "scoping" do
        let(:var) { 1 }

        it "is tricky" do
          var = var + 1
          expect(var).to eq(2)
        end
      end

Will this pass or fail?

Kotlin in the browser: attempting Firebase + Multiplatform

David Haley — Fri, 04 Jul 2025 08:43:40 +0000

I started using Kotlin to write browser apps about 1.5 years ago. It's definitely an early adopter experience. It worked well for a non-trivial but basic app. For the next frontier, multi-platform development with Firebase, I got stuck.

Since then, I've largely completed my basic needs for You Need a Splitter: an app that integrates with You Need a Budget to speed up my budget splitting workflow.

In parallel I also wrote ReservationsApp in Typescript with Angular, using Firebase. This gave me a baseline for comparison.

YNAS was written with KVision, an object oriented web framework for Kotlin/JS. I was able to develop a non-trivial app: multiple components, dialog boxes, asynchronous interaction… All in all, it was a good experience.

At first I struggled with state management (understanding which changes did/didn't trigger reactive renderings). I found Angular's state management clearer– although, Angular has changed so much & so fast, some community content is obsolete. (On that note, the KVision community is much smaller.)

One example is using arrays: is it changing the array, or its contents, that triggers a re-render? And how do you avoid tearing down & rebuilding elements that didn't change, if you react to the list reference changing?

There were also rough edges around exact DOM interaction. Here's an example where a component re-render wasn't updating FontAwesome icons. I needed to use a special unique DOM key so the buttons' icon changes would take effect.

An interesting side quest: to use the YNAB javascript SDK in Kotlin, I needed to generate Kotlin type declarations. I wrote a quick README with some notes. TLDR, I used dukat to generate the declarations but had some manual work to do.

All in all: I thought it was a success 😤 My main regret though was not accomplishing the multiplatform vision. Running YNAS on a mobile app would require a UX rewrite. 😩

Emboldened by this experience I wanted to tackle that more ambitious project: a Kotlin-Multiplatform app backed by Firebase.

Attempt: a multiplatform Firebase app

I'm building a financial planning application. YouNeedABudget is great for understanding where your money went, and allocating it to future goals. But it doesn't help me plan for the future in bigger ways: asking what-if questions like buying with cash vs financing a purchase.

The UX would center on charts: projecting cash flow scenarios, that sort of thing.

Why Firebase? Two reasons: serverless, and realtime data access. I really don't want to maintain (or pay for) servers but I do need data accessible beyond my laptop. The realtime data is a delightful cherry on top: the reactive rendering loop is built on shared data, not just local state.

I was encouraged by GitLive's Firebase Kotlin SDK … at first glance, it was exactly what I need: a multiplatform Kotlin library.

Now for the UX framework. The Compose-Multiplatform framework's big promise is to support "all" targets: web, desktop, iOS, and Android.

Compose-Multiplatform supports web through WASM, not regular Javascript (transpiled from Kotlin to be indistinguishable from normal javascript apps).

The WASM build has some caveats (I discussed in my previous post). But if it meant one codebase for all platforms, I was willing to bet on WASM's future development.

Then I needed a charting component. I settled on KoalaPlot which has a variety of plots, various customization options, and ongoing development.

Result: defeat

Alas– I got "this" close but not close enough.

I was able to build a HelloWorld app running on all platforms (well, I didn't test iOS, and only ran Android on the emulator). Here it is running on web (after PR#3):

So far so good…! I even got hot-reload working for desktop, a major development cycle speedup.

Then I tried incorporating the GitLive Firebase SDK … I hit a hard wall and the magic ended. 😮‍💨
It doesn't support WASM (issue #440).

But it does support Kotlin/JS, meaning a Kotlin project transpiling to Javascript could use GitLive's Kotlin Firebase SDK.

However– that meant getting Compose itself running for Kotlin/JS. I gather it's possible…? Or at least, at some point of the experimental lifecycle of Compose-for-Web, Compose-HTML, and other library evolutions, it was maybe possible to do some things.

KoalaPlot itself also referenced Kotlin/JS so I was hopeful that I could convince the pieces to work together.

… No.

I got a Compose app compiling, but it was unable to find various font rendering functions at runtime. I went down a rabbit-hole of preloading custom emoji fonts, an apparently related but actually different set of problems. I had to downgrade from material3 to material to get it compiling. The problems piled up but solutions did not.

My conclusion was, if I want Firebase, I can't have WASM. And if I can't have WASM, then I can't have Compose. And if I can't have Compose, then I can't build a UI toward multiple targets. Sad…

What's next

At this point, I don't have great options with Kotlin:

Feature	Multiplatform	Kotlin/JS
Firebase	❌	✅
UX framework	Compose	Questionable

How feasible is building Firebase on Multiplatform? The discussion on issue #440 shows several attempts to get GitLive's Firebase SDK working on wasm, with only limited success (eg just the Auth library). That's great– but I need more, Firestore in particular.

In principle, I could write a multiplatform interface layer, with a WASM native implementation that calls out to the JS SDK… but no, I'm not gonna do that.

I could use the Firebase SDK on mobile & desktop– but if I have to choose those or web, I choose web. Mobile apps pose a number of distribution challenges that simply aren't worth it at this stage. Besides, a responsive web app would be ok on a phone.

Given that I'm stuck with Kotlin/JS for now– it's worth highlighting how straightforward it was to build an Angular + Firebase app with Typescript. There's no question that working within a major web app framework provides a far more integrated & seamless experience. I still prefer Kotlin 😝😭 but is it worth it?

Generally, I like my code's core framework (eg UX layer) to be battle-tested. I have limited tolerance for rough edges. 🫩

To proceed with Kotlin, I'd need to use Kotlin/JS which means sticking with KVision (I don't want to learn yet another non-Compose Kotlin+HTML framework). While KVision is fine (and I paid the learning price) the KVision developer has put recent focus into Kilua, which does target WASM working directly with the DOM no less. But it's even more cutting edge than KVision.

Could I combine Kotlin with Angular? Well, in principle Kotlin/JS means it's "just" a question of getting the build system set up correctly. Brian White wrote a StackOverflow post explaining how to do this– it's worth a try, although 2 years is "forever" in early adoption timelines. Besides, this lets Angular see the Kotlin code: I don't think it lets Kotlin code drive the Angular framework.

Final thoughts

I had high hopes for this, high enough to spend about 10 hours exploring all these rough edges. Unfortunately, at this juncture multiplatform building (mobile plus web) remains elusive. It's not enough to compile code to run on a platform: you also need frameworks to turn code into pixels, etc…

Given all this friction: is Kotlin in the browser worth it? The main advantage (besides language preference) is a monorepo that powers frontend and backend. After all: background processing needs to wrangle the same data types. What if I learned Dart/Flutter…

But today, I need to move on with my problem: visualizing my cash flow scenarios. Here's my backup option: the simplest solution, duplication. Build Kotlin + Typescript code around Firebase's data model. Make sure the data models don't get out of sync.

One day, as these frameworks mature, I'll revisit how much more I can do with Kotlin in the browser. In the meantime, my quip from Part I feels even more real:

My goals were ambitious– perhaps one does not simply do anything. 😎

Performance trap: general libraries & helper objects

David Haley — Wed, 09 Oct 2024 23:26:30 +0000

Convenience and performance are typically inversely correlated. If the code is easy to use, it's less optimized. If it's optimized, it's less convenient. Efficient code needs to get closer to the nitty gritty details of what is actually running, how.

I came across an example in our ongoing work to run & optimize DeepCell cellular segmentation for cancer research. The DeepCell AI model predicts which pixels are most likely to be in a cell. From there, we "flood fill" from the most likely pixels, until reaching the cell border (below some threshold).

Part of this process involves smoothing over small gaps inside predicted cells, which can happen for various reasons but isn't biologically possible. (Think donut holes, not a cell's porous membrane.)

The hole-filling algorithm goes like this:

Identify objects (contiguous pixels with a given cell label with the same numeric id).
Compute the "Euler number" of these cells, a measure of the shape's surface.
If the Euler Number is less than 1 (aka the surface has gaps), smooth out the holes.

Here is an example of Euler numbers from the Wikipedia article; a circle (just the line part) has an Euler characteristic of zero whereas a disk (the "filled-in" circle) has value 1.

We're not here to talk about defining or computing Euler numbers though. We'll talk about how the library's easy path to computing Euler numbers is quite inefficient.

First things first. We noticed the problem by looking at this profile using Speedscope:

It shows ~32ms (~15%) spent in regionprops. This view is left-heavy, if we go to timeline view and zoom in, we get this:

(Note that we do this twice, hence ~16ms here and ~16ms elsewhere, not shown.)

This is immediately suspect: the "interesting" part of finding the objects with find_objects is that first sliver, 0.5ms. It returns a list of tuples, not a generator, so when it's done it's done. So what's up with all the other stuff? We're constructing RegionProperties objects. Let's zoom in on one of them.

The tiny slivers (which we won't zoom into) are custom __setattr__ calls: the RegionProperties objects support aliasing, for instance if you set the attribute ConvexArea it redirects to a standard attribute area_convex. Even though we're not making use of that we still go through the attribute converter.

Furthermore: we aren't even using most of the properties calculated in the region properties. We only care about the Euler number:

props = regionprops(np.squeeze(label_img.astype('int')), cache=False)
for prop in props:
    if prop.euler_number < 1:

in turn, that only uses the most basic aspect of the region properties: the image regions detected by find_objects (slices of the original image).

So, we changed the code to fill_holes code to simply bypass the regionprops general-purpose function. Instead, we call find_objects and pass the resulting image sub-regions to the euler_number function (not the method on a RegionProperties object).

Here's the pull request: deepcell-imaging#358 Skip regionprops construction

By skipping the intermediate object, we got a decent performance improvement for the fill_holes operation:

Image size	Before	After	Speedup
260k pixels	48ms	40ms	8ms (17%)
140M pixels	15.6s	11.7s	3.9s (25%)

For the larger image, 4s is ~3% of the overall runtime– not the bulk of it, but not too shabby either.

Improve container build time by 70% w/ better caching

David Haley — Thu, 26 Sep 2024 00:34:37 +0000

Our ongoing work to run DeepCell on GCP Batch produces a very large container: 5 GB compressed. Most of it is the Python & binaries required to run TensorFlow and all associated GPU code. It took ~13 minutes to build on GCP Cloud Build.

By leveraging Docker's cache better, we brought that down to ~4 minutes, a roughly 70% improvement.

Before	After	Delta
13min	4min	-9min (-70%)

Docker builds containers by creating a layer for each build command. The layers "stack" onto each other, adding or changing what's in the container so far. Loosely speaking, the layers are like snapshots of the container contents.

Docker can cache layers in the build process. Unless the build instruction changes, like updating the command or copying a different source file, the layer doesn't need to be rebuilt.

Our Dockerfile looked like this: (unabridged version here)

FROM <base_container>

RUN apt-get update -y && apt-get install -y <packages>

# Add the repo sha to the container as the version.
ADD https://api.github.com/repos/dchaley/deepcell-imaging/git/refs/heads/main version.json

# Clone the deepcell-imaging repo
RUN git clone https://github.com/dchaley/deepcell-imaging.git

# Switch into the repo directory
WORKDIR "/deepcell-imaging"

# Install python requirements
RUN pip install --user --upgrade -r requirements.txt

# Install our own module
RUN pip install .

When we added caching, we saw a smaller speedup, about 30%. We avoided reinstalling the apt-get packages but we were still reinstalling Python dependencies … some of which (like TensorFlow) are very hefty, and many of which require compilation.

The full cache invalidation rules are a bit tricky. But the basic idea is simple. Layers are invalidated if the command changes or copied files change. If any layer is invalidated, all subsequent layers must be rebuilt.

In our case, by adding version.json, we were invalidating everything below, in particular installing the Python dependencies from requirements.txt. But these change quite rarely, compared to our application code!

Normally it's a GoodThing™️ to force a rebuild if code changes. But we don't want to lose the Python dependencies cache. To stop invalidating the cache for dependencies, we explicitly pulled in just requirements.txt, installed those, and then pulled in the overall source code. This means we still rebuild dependencies if they change, but if they don't … we don't!

Our new Dockerfile looks like this: (unabridged version here)

FROM <base_container>

RUN apt-get update -y && apt-get install -y <packages>

# Fetch the Python dependencies
ADD https://raw.githubusercontent.com/dchaley/deepcell-imaging/refs/heads/main/requirements.txt requirements.txt

# Install python requirements
RUN pip install --user --upgrade -r requirements.txt

# Add the repo sha to the container as the version.
ADD https://api.github.com/repos/dchaley/deepcell-imaging/git/refs/heads/main version.json

# Clone the deepcell-imaging repo
RUN git clone https://github.com/dchaley/deepcell-imaging.git

# Switch into the repo directory
WORKDIR "/deepcell-imaging"

# Install our own module
RUN pip install .

Then, we rebuilt the container after a small code change, and observed the fuller benefits of the cache- avoiding the needless rebuilding of Python dependencies.

It's been really interesting learning the various ways of building containers & their pros/cons. A lot of containers are built by copying files from the local directories into the container, rather than checking out from source. This has many advantages like you can build a test/dev container from whatever you currently have. I wanted to simplify and make sure the container was always built from main. The double-edged sword of simplicity.

Optimizing QuPath intensity measurements: 12.5 hr to 2min

David Haley — Sat, 31 Aug 2024 06:50:54 +0000

Spatial biology analyzes tissue sample images to derive patterns and data. A key first step is identifying cells on the image and gathering quantitative measurements about those cells.

In our ongoing work scaling DeepCell on GCP Batch, we'd previously gotten pretty efficient at the first part: segmenting the image into cells. But we hit a major performance roadblock for the next step: generating quantitative measurements.

The measurements are fairly straightforward:

size of each cell (convert pixels in each detected cell to physical dimensions, assuming some number of microns per pixel)
pixel intensity of each cell

Of note, for a ~140M pixel image, it took about 12.5 hours (‼️) to measure the detected cells. That's … not great 😩 What the heck?? We're just counting number of pixels, and pixel values. An HD image is ~2 M pixels, and computers (and TVs & phones) render >30 of those per second.

Profiling to the rescue. The great thing about JVM code is that it's extremely easy to profile. Just click "profile" instead of "run".

Here's the resulting flamegraph.

Of note, 99.9% of adding intensity measurements–84% of the total time–is spent simply reading the image.

OK: so we need to not read the image repeatedly. In our case, the entire image can (for now) fit into RAM. If only we could simply prefetch the image, then read regions out of that in-memory image.

Sounds like a great use case for the Proxy pattern. We need an ImageServer that behaves just like the original image server, except, it reads from an in-memory image not from disk (or wherever the wrapped server reads).

The resulting code is quite simple. Here's the pull request. We override the abstract ImageServer, wrapping another ImageServer and forwarding all methods to the original.

UPDATE 2024-09-10: Thanks to Adrián Szegedi (GitHub HawkSK) the code is even simpler (PR#42): no need to explicitly forward methods. Instead we use Kotlin's delegation syntax which implicitly forwards non-overridden methods. This removes 100 lines of boilerplate 💪🏻

The one non-forwarded method is the core operation: reading a region.

That one turns into extracting the region from the entire (prefetched) image:

  private fun readFullImage() {
    if (prefetchedImage != null)
      return

    logger.info("Prefetching full image at path: ${wrappedImageServer.path}")

    val wholeImageRequest = RegionRequest.createInstance(
      wrappedImageServer.path,
      1.0,
      0,
      0,
      wrappedImageServer.width,
      wrappedImageServer.height
    )
    prefetchedImage = wrappedImageServer.readRegion(wholeImageRequest)
  }

  override fun readRegion(request: RegionRequest?): BufferedImage {
    if (request?.z != 0 || request.t != 0)
      throw IllegalArgumentException("PrefetchedImageServer only supports z=0 and t=0")

    readFullImage()
    return prefetchedImage!!.getSubimage(request!!.x, request.y, request.width, request.height)
  }

This way, we only read the image once, and fetch all subregions from the in-memory image.

Here's the speed-up in the real-world (Google Batch)

Before (min)	After (min)	Delta
745	2	-743 min (-99.7%)

In the words of the great Tina Turner: Boom, Shaka Laka.

Re-rebuilding TF2.8 image: 369 patches

David Haley — Sun, 18 Aug 2024 07:56:00 +0000

I wrote previously about rebuilding the TF2.8 image to patch vulnerabilities.

I noticed the issue scanner had crept up again. So I rebuilt the container again, using the same dockerfiles (etc).

The upstream changes (e.g. in Ubuntu 22.04) pulled in 369 security fixes. 🎉

Along the way, I regretted that the tensorflow-2.8.4-redux repo doesn't have automatic container building. I can't build it locally any longer as it needs x86_64 but I have arm64. 😓 I was able to build it in Cloud Shell easily enough for now.

gs-fastcopy: get CPU count for upload workers

David Haley — Tue, 23 Jul 2024 02:48:43 +0000

See previous post: Introducing gs-fastcopy

I shipped the enhancement gs-fastcopy-python#10: Inspect processor count for better upload defaults.

Previously, we were defaulting to 8 workers (Google's default). On a system with more than 8 cores, that's leaving a lot idle!

Now, we inspect the available CPU count. We honor os.get_schedaffinity on systems that support it (processors available to this process not just in general) otherwise, we use os.cpu_count().

Benchmarking results: [source sheet]

Note how adding workers speeds up the process, but yields diminishing returns. I think that's when the network transfer itself becomes more the bottleneck, but it's likely that tweaking chunk sizes would help too.

Also note the more dramatic effects when using compression (using pigz, parallel gzip). pigz would've picked up on the max workers before, what's new here is including them in the upload as well.

Introducing gs-fastcopy

David Haley — Sun, 21 Jul 2024 23:23:35 +0000

These days, a single laptop can chomp through gigabytes of data in seconds. So why was it taking ~1.5min to compress & upload 2 GB? Why was it taking ~10s to download just 100 MB?

I get bothered by code that "should" be fast but isn't, when I have to wait around for it. Maybe it's 30+ yrs experience with software, 25+ years in web dev: I have a pretty good sense when something is slower than it "should" be.

And o', but am I never satisfied with needlessly slow code.

Time is both time and money. The more cancer researchers can process data, the faster we get to innovative treatments and save lives. And going 2x as fast with the same hardware typically means spending 1/2 as much. In an eventual clinical setting, every cent matters when it comes to tests being given freely… which can mean life & death.

I checked with my co-conspirator Lynn Langit: "these speeds, but really though?" She pointed me at the gcloud CLI tool's much superior performance in file transfer.

That began an investigation into optimizing transfer: basically, the standard Python (& other) Blob implementation is single-threaded. So much computing power just … sitting there sad & idle.

It's nice when default settings "just work" – correctly, but also fast. The numpy library is absolutely brilliant because it brings all kinds of low-level hardware optimization into Python, you don't have to think about it.

In that spirit, I hope to make cloud storage file transfer just that much easier, so that you don't have to think about it to get fast performance.

Without further ado: introducing gs-fastcopy:
https://medium.com/@dchaley/introducing-gs-fastcopy-36bb3bb71818

It's my first open-source public Python package 🐍 📦 🎉

Package: https://pypi.org/project/gs-fastcopy/
Source code: https://github.com/redwoodconsulting-io/gs-fastcopy-python

Now I download & uncompress those 100 MB in just a couple seconds, not 10. I'll take a 5x speedup. And the impact is only bigger as the files get larger.

Ensuring GCE instances have full access to GCP APIs

David Haley — Sat, 20 Jul 2024 02:58:50 +0000

The default settings for GCE instances are fairly locked down from accessing Google APIs, but it's not obvious that's happening!

Check out the instance creation settings:

You might think that "allow default access" means "use normal permissions as already configured". But … no 😅 Hover over the "?" icon and see:

Default: read-only access to Storage and Service Management, write access to Stackdriver Logging and Monitoring, read/write access to Service Control.

In other words, creating a GCE instance with default settings means you can't write to storage even if the default service account has write permissions.

You have two options:

Go with full access according to permissions: Allow full access to all Cloud APIs
Customize each service: Set access for each API

I went with the former, as I'm ok relying on the service account permissions. It's nice to know a more secure environment could lock down the account to just what's needed for that particular case (vs everything the account can do).

🔐

After this change, I can create VMs that can read/write storage. Ahh 😌

Improve TensorFlow model load time by ~70% using HDF5 instead of SavedModel

David Haley — Thu, 11 Jul 2024 03:55:16 +0000

In our ongoing work running DeepCell on Google Batch, we noted that it takes ~9s to load the model into memory, whereas prediction (the interesting part of loading the model) takes ~3s for a 512x512 image.

The ideal runtime environment is serverless, so we don't have long-lived processes which would load the model once, to predict multiple samples across multiple jobs. Instead, each task instance needs to load the model before doing any work. So, it hurts when the model takes 3x the load time of the actual work… it certainly makes it inefficient to scale horizontally with one short-lived compute node per prediction.

My local machine (a macbook m3 max pro) took ~12 s to load the model, the slowest part of the entire preprocess → predict → postprocess pipeline.

I was curious why it took so long to load the model into memory. It's "only" ~100 MB on disk.

I came across TensorFlow Performance: Loading Models by Libor Vanek. It compares the load times for different formats. Here's the punchline:

I was intrigued 🤞🏻 could we get similar speed-ups just by changing the format?

Yes:

Environment	SavedModel	HDF5	Diff
Macbook M3 Max Pro	12.3 s	0.84 s	-11.46 s (-93%)
n1-standard-8 w/ 1 T4 GPU	8.99 s	2.68 s	-6.31 s (-70%)
n1-standard-32 w/ 1 T4 GPU	8.21s	2.72 s	-5.49 s (-67%)

Of note, loading the model into memory used to take ~3x the time of prediction. Now, it's roughly the same.

Converting the model was easy:

# Load the SavedModel version
model = tf.keras.models.load_model("/Users/davidhaley/.keras/models/MultiplexSegmentation")
# Save as HDF5
model.save("MultiplexSegmentation-resaved-20240710.h5")

We needed to adjust one factor: the load_model call needs an additional parameter to locate custom training objects:

from deepcell.layers.location import Location2D

# [...]

model = tf.keras.models.load_model(
    model_path,
    custom_objects={"Location2D": Location2D},
)

We learned this by importing the HDF5 file without the custom_objects and getting the error that Location2D wasn't found.

This is the only caveat we've found with the HDF5 format: needing to tell it where to find the custom objects. The prediction results appear to be the same.

70% just by using a different file format!