Python is the language of choice for most Machine Learning code, but I prefer to run my back-end services using NodeJS. I'm currently working on a Node based project where I need to call the HuggingFace Diffusers module. So I have implemented an IPC wrapper around the Python script.
It works by sending JSON messages back and forth over stdin/out using U+0000
(NULL,\0) as a delimeter. This is one of the few characters that must be escaped in valid JSON. So we can be confident that no response should contain this character.
Since the Python script is blocking, we can send an arbitrary number of messages to it and they will be automatically buffered until the process can handle them.
Responses from the Python process are manually buffered until we receive the null byte and then decoded.
The Node Side
import { spawn } from 'child_process';
import { dirname, join } from 'path';
import { fileURLToPath } from 'url';
// Start the Python process
const __dirname = dirname(fileURLToPath(import.meta.url));
const pythonProcess = spawn('python3', [join(__dirname, 'example.py')]);
// Buffer for incomplete messages
let buffer = '';
// Handle incoming messages from Python
pythonProcess.stdout.on('data', (data) => {
// Split incoming data on null bytes
const messages = (buffer + data.toString()).split('\0');
// Store incomplete message
buffer = messages.pop();
// Process complete messages
for (const message of messages) {
try {
const response = JSON.parse(message);
console.log('Received from Python:', response);
} catch (error) {
console.error('Parse error:', error);
}
}
});
// Handle Python errors
pythonProcess.stderr.on('data', (data) => {
console.error('Python stderr:', data.toString());
});
// Send messages to Python
function sendToPython(message) {
pythonProcess.stdin.write(JSON.stringify(message) + '\n');
}
// Test the communication
sendToPython({ type: "test", message: "Hello Python!" });
The Python Side
import sys
import json
import time
def log_message(message):
# Write messages with null terminator as delimiter
sys.stdout.buffer.write(json.dumps(message).encode() + b'\0')
sys.stdout.buffer.flush()
if __name__ == "__main__":
log_message({"status": "ready"})
# Loop forever to keep the process active
while True:
try:
# Read input from Node
data = json.loads(input())
# Echo back with timestamp
log_message({
"status": "success",
"received": data,p
"timestamp": time.time()
})
except Exception as e:
log_message({
"status": "error",
"message": str(e)
})
This is bare-bones example of how to accomplish this. Stay tuned for more details about this project in the near future.
Top comments (4)
Super clean setup, always love seeing cross-language hacks come together like this. You ever hit messes with buffering or pipes going weird when stuff gets high volume?
So far I have not seen any issues, but I have not pushed it very hard yet. I'm sending messages over a WebSocket from the browser to Node that carry the parameters for diffusion. I have queued multiple requests across several instances of the client and the result always comes through properly.
I haven't come up with a way to trigger exactly simultaneous requests to see if that will work properly, but I think it should be fine. The requests are sent from Node to Python as a complete JSON body and should never be too large for the buffer.
It's the outputs that end up coming through in chunks due to the size of the base64 encoded image data. But the Python script can only process one request at a time so there is no chance of collisions.
The cover looks great! Looking forward to more details.
I know, right! They look so happy together. 😄 Thanks for reading!