Forem: Arman Tarkhanian

2024-08-13: Finally out

Arman Tarkhanian — Tue, 13 Aug 2024 21:50:40 +0000

Last week was a pretty hectic week.

After being somewhat demotivated the past couple weeks, I finally got inspired to push through and get the entire decision engine done.

I've had this concept brewing in my mind since Comigo, so it's great to actually put it into action.

The idea is that you start off in a "lobby," and every message while you're still in the lobby gets passed through a "decision engine" in the backend, which is essentially just a smaller LLM that gives you a simple output from the user's input. This is done behind the scenes. From that output from the decision engine, it then swaps out the prompt dynamically and gets you started on another process.

I had this value of which prompt to use be stored in the database, which was a simple and elegant enough solution. I also took the liberty in making sure the code was expandable for more and more prompts, so that's all taken care of.

I was pretty excited to get through it, though.

However, for some bad news, I ended up meeting with the CEO on Friday, and it turned into a somewhat heated argument regarding wages and such, and so I've since left. In that regard, my time at this stealth company is over, but luckily, I am starting a new contractual position hopefully soon with the company that my other coworker (whose duties were fulfilled and contract completed) referred me to, Invisible. I'll be working as a prompt engineer.

That being said, I also have an upcoming job at ICF, hopefully starting early September. That'll be exciting as well.

Either way, one door closes and another journey begins. I'll naturally keep updating this blog. Hopefully I will be more motivated since I'll be actually getting paid, and with a bigger company.

2024-07-29: Uneventful #2

Arman Tarkhanian — Mon, 29 Jul 2024 16:15:43 +0000

The past two weeks have been pretty uneventful, again. It's mostly just me trying to implement that decision engine, but honestly, I've been lacking motivation recently due to not having been paid. Our product meetings have also been canceled, so it's a bit weird just roughing it out on my own and with this new product manager we have.

My approach so far with the decision engine is to simply have a message parser that will take any user message (ideally in parallel) and turn it into a "command" which will be stored in the database, and that selection will switch out the prompt to the necessary one. I also have a "lobby" which basically just redirects you to mention a problem.

Anyway, that's kinda all where I'm at. Hopefully I have more to report next time. Till then, cheers.

2024-07-15: Uneventful

Arman Tarkhanian — Tue, 16 Jul 2024 04:43:24 +0000

So... I was on vacation two weeks ago, and last week was extremely uneventful.

I honestly don't have much to report other than that we're now working on the very beginnings of a decision tree/engine in the bot. Chat will be automatically switching out prompts based on what the user is saying.

It'll be interesting once it's fully in place. That's what we're working on primarily this week.

2024-06-17: CoT prompting

Arman Tarkhanian — Tue, 18 Jun 2024 18:48:12 +0000

Seems like we still have a fire lit under our butts. v0 was apparently not satisfactory enough, so my primary goal last week to make a more fully-fledged version of CoT (as I mentioned last week) so that the AI can actually think about its responses before querying the user for further details.

It was a bit of a technical lift to get it done. The way our project is set up is a little weird, in that, we're using Django for routing so for some reason all our LLM logic is localized to the serializer and apps.py files, even though it really should all be refactored into their own files. I had to do a bunch of theorycrafting and planning to come up with a design that I thought would work for the CoT. Obviously sending that whole massive chunk of thinking is not good for UX, so it needs to be done under the hood. So I thought it should do a CoT and THEN have a second step to refine that into a singular question or statement. So I set out to do that. I had to end up keeping two separate memories, though, and we apparently have like 5 or 6 variables dedicated just to handling the memory/conversation/chat history, so it got real confusing real fast.

Fortunately, I managed to detangle all that, and I also had one of the other backend engineers show me how to add a new column to the database for this CoT message so it can pick up from where it left off upon reload or switching conversations. All this took like 12 hours on Tuesday.

So it was working, but it needed some further engineering to make the prompt actually do a good job of improving the UX. So I met with the product team several times to just jam it out and add and tweak the prompts until it was up to the standard we wanted. The only major issue is that it's SLOW, but I guess that's the point. They preferred it take 5 seconds to generate a response over it being suboptimal.

Anyway, so all that was good and proper, then came the menace of trying to merge it back in to dev so that everyone could try it. The rest of the dev team had been working on implementing text streaming for the chat messages, and the way they did it is through SSE instead of simply doing it over a websocket, so the code was absolutely grotesque.

On top of that, instead of relying on version control, the one guy primarily working on it decided to make an entirely separate file/routing system just for the streaming, and I'd been working out of the "vanilla" ones this whole time. I immediately told them to start "zipping" it all together, and that was a whole mess on Friday because they didn't understand what I meant for some reason, so that one dev just slapped on the "new" methods into the original file instead of actually just reworking the functionality. Eventually the lead dev stepped in and did it himself, and I was able to merge in my code. It didn't work properly, but by the time I had to sign off on Friday (it was a busy weekend for me), it was about 90% there so I left it for the other guys to work on.

They ended up fixing it (mostly), but then a whole mess happened with threading that I'll cover next blog post because it's technically been this week, not last week. The reason why we were in such a rush to get this out, as I mentioned, was to get investors hooked. Hopefully it was all worth it.

Until next time, cheers.

2024-06-10: v0 complete

Arman Tarkhanian — Mon, 10 Jun 2024 16:18:25 +0000

So, once again, I neglected to write a blog post last week. I don't want to make this a habit, but I guess it's been so busy these past two weeks that I didn't want to bother writing a whole post last week.

We've been focusing on getting our v0 (aka alpha version) out so that investors and such can start testing the app.

To generally cover what we've gotten done:

iOS app is out on Test Flight
revamped prompts
speech to text
two new architectures/prompts beyond what we had before
a chat history sidebar
selection of the architectures
message limits
a bunch of other small improvements/bugfixes

Generally, I've been working with my fellow prompt engineer and the others in the product team to get the prompts to do what we want, as has been the goal for the past 3 months. I believe I mentioned last time that we were transitioning from using fine-tune data to using just prompts for now. We need more resources to do fine-tune.

That being said, I also designed a script to get two AI models to talk to each other in a loop until the conversation naturally finishes. This was mostly to test that our prompts are adequate, but I can also take them and adapt them to fine-tune data. But the major issue is that it costs a lot and I'm actually not sure if it's generally better in terms of quality of the conversations.

Anyway, another thing we're considering is the fact that there's no chain-of-thought reasoning possible for our app. We are designing it so that the bot gives the shortest, most concise lines of questioning, which disallows for it to reason for the previous context. This causes the bot to follow the prompt and goals that we laid out a little too rigidly instead of following a natural flow. So to mitigate this, I'm thinking of designing a two-step process where it will take a long-winded response with chain-of-thought reasoning, then reduce that down to a singular question. This way, it can follow its own logic better.

So that's what's happened so far. Hopefully it turns out for the better. Until next time, cheers.

2024-05-21: Trouble and Turmoil

Arman Tarkhanian — Wed, 22 May 2024 04:01:48 +0000

Last week was pretty hectic. We figured that the model was definitely overfitted, because we had written out data for only the beginnings of conversations, hoping that the "natural" instincts of GPT-3.5 would be able to take over after that point, but unfortunately it would continue questioning in an endless cycle. There were even spelling mistakes and other typos, which, while I did not notice while reviewing the data, may have ultimately come from there.

Another major issue was that the model had a relatively large validation loss. I suspect this is because I was forced to do a 90/10 split on training vs. validation, and I didn't want to do k-fold cross-validation because I figured 85 data points would be enough (and it would be difficult to code up, unless there's a library out there specifically for OpenAI's fine-tuning tool); regardless, 9 validation conversations isn't enough internal testing to minimize any sort of loss. The numbers ultimately came out to 0.8240 training loss and 1.7145 full validation loss (oof).

That meant that the data was definitely overfitted, which we could discern almost immediately considering the AI would not steer course even after something completely irrelevant was thrown its way (e.g. literally correcting its grammar).

Anyway, so we set out to create far more data points for a second round of fine-tuning as well as making sure that there is a. a lot of validation data, b. lots of full conversations, not just beginnings, and c. ~200 new conversations.

So I generated all that plus a bunch of new categories for validation and handed it off to the product team so they could start crafting all of that. As I write, I'm still waiting on them to finish it.

We also found that we need to do a crunch on getting v0 (what they're calling alpha) out to demo to investors. The CEO was really pushing for two new "architectures" (basically what we're calling our proprietary applied prompt+fine-tune sets) in addition to porting to iOS and, of course, making it have our brand voice. Needless to say, this cause a bit of a riot because it was a lot of work to squeeze in within the span of a week, plus still none of us have gotten paid yet.

With the rest of the team, we planned out what we needed to do for the coming week. I also met with our new "fractional integrator" who was supposed to help streamline organizational processes, which we were definitely lacking in. I talked with her about some things I wanted to see so we could optimize communication internally with the engineering team, since it was really lacking in that front. I'm not sure how much I've mentioned it, but most of the engineers are working out of Europe, particularly Germany, and they are very much against having actual synchronous meetings, partly because of the time zone difference but mostly because I feel like they don't want to be collaborative and just work alone.

Anyway, this has been causing a lot of issues. They also don't really write any documentation of any sort, so all the code we have is just written as-is and is hard to navigate, even with chucking in everything into ChatGPT to help explain stuff. So when I talked to our integrator, I told her that we need a code style guide that must be strictly followed, and that I would take on the responsibility of writing that.

Since I've had some kinda-sorta downtime with waiting on the rest of the product team to write up that data, I've been also writing that code style guide, trying to include as much detail as possible so things don't get overlooked. I want to apply this to code reviews as well, so it can be enforced.

The integrator and I also discussed removing Jira from our workflow because it feels too detached from the product team, and so I think we're in the process of keeping it all unified on Notion.

We also worked out a "sprint" system for situations like this where we're in a time crunch, but not velocity-based. Either way, we're racing against a clock so that our CEO can demo the app to some big-name investors, who aren't even technologically-inclined.

Finally, we discussed removing a bunch of useless channels on Slack because they were clogging up the sidebar immensely. That one is just a routine maintenance thing.

Things were really coming along great, though, until randomly on Friday our head research scientist left because of an argument she had with the CEO, so now we're trying to scramble around to find a replacement (most likely) while also missing the core person of our product team to help write training data and guide the architectures.

So hopefully things get smoothed over and she comes back, or we find someone new quickly who can be brought in the loop really fast. Thankfully, though, it did buy us a week to organize all this new stuff that the CEO is asking from us, so at least that's going for us.

Anyway, that's all for now. Until next time, cheers.

2024-05-13: Skipped a week

Arman Tarkhanian — Mon, 13 May 2024 16:54:24 +0000

So... I neglected to write last week because, honestly, not much happened. It was just some development on the secret project, and I can't really detail what that was about so I figured it was a waste of time to write a post. Regardless, it's kind of on pause now until we find a proper API for what we want, so there's that.

Last week's major theme was fine-tuning. The Friday prior to last week, I filled in with the product team how to start filling out the generated conversations that I generated for them through basically the same script I had from Lawgoat's time. So they were on that for a couple days.

In the meantime, I tried fiddling around with RAG because the backend engineer was having issues with the bot not actually using our real prompt. It would just use the default GPT-4 with pulling up random info from the books. It wasn't really documented, so I spent a lot of time trying to figure out what was even going on.

At some point, I had to give up because the fine-tuning was looking to be most important right now. We reckoned that RAG wasn't really applicable to our use case right now anyway. Someday down the line I'd like to rewrite it all from scratch because I'm not entirely sure our backend engineer actually understands how Langchain works.

Anyway, so we transitioned back to focusing on fine-tuning, so I spent a lot of Thursday and most of Friday just grooming out the finer details of the data to make sure it was all ready. Thursday, we also had a product team call talking about v0 and what needs to be included in it. The most notable thing to come from that was that we could implement few-shot to generate the second part of the process.

I noticed that one JSON of our FT data didn't have enough examples so I took that idea of few-shot and rendered it into my generator script, and it produced significantly higher-quality examples. I had the team fix that up, though, and by Friday I was good to start writing a bunch of post-processing scripts to make things a lot easier.

Once everything was post-processed and checked for grammar/awkwardness/spelling/punctuation, I finally threw it into the fine-tuning interface on OpenAI's website. Ten minutes later, we got our new model (took way faster than I expected).

It definitely was better in terms of voice, but it might still need some work depending on our testing. But either way, it's definitely a step in the right direction.

Hopefully this trend continues. Anyway, until next time, cheers.

2024-04-29: It's gonna be May

Arman Tarkhanian — Wed, 01 May 2024 03:21:30 +0000

I have some exciting news to share: finally getting pre-seed funds in within 3 weeks, which is great.

Let's detail what I got done last week:

Monday and Tuesday were pretty light and same as usual, where I spent more time doing some research. I got back and continued my work on that RAG prompt that the other devs wanted me to give them so that it can retrieve the proper documentation for what we want it to do. I had some personal trouble figuring out how I wanted to organize it (like which file to put it in and where, since it's kind of a jumbled mess at the moment), so I just gave it as text to one of them and hopefully he put it where it needs to go. I also took the time to specifically research the best way to be chunking the text in the documents since apparently the other dev did it in a naive way instead of by paragraphs/sections (so that semantically similar things are more reliably grouped). He was complaining about latency in the chat part of the application so it's currently in the post-processing part only. That needs to be optimized for sure.

Wednesday I took off for Armenian Genocide Memorial, so not much to report there.

When I got back on Thursday, I met with the product team again where we started talking about more architectures that we wanted to implement into the system, taking our original prompts and adapting them to the practical applications that we want. Again, I have to be vague since the topic of the application is under wraps, but all I can say is that it's a wellness application, not unlike Comigo. But anyway, after that meeting, I did some more research. I needed some more to go off of for the architectures, so I waited on them to give me that (which hasn't happened yet to the date I'm writing this, but it's fine).

Friday, the CEO came across a cool video on YouTube that demonstrated a technology that he wants to add to the platform. Now, he really wants to keep this one a secret, so I unfortunately am not going to be able to talk about this one at all, but he wants me to spearhead that effort as a future product feature, which is going to be great. With this, the news about funding, and a new coworker that I've been closely working with, I have a renewed enthusiasm for the role here, which is awesome.

But anyway, that's all for this week. Until next time, cheers.

2024-04-23: Interviews

Arman Tarkhanian — Tue, 23 Apr 2024 18:59:33 +0000

Last week was pretty slow at the startup. We're kind of in a stasis at the moment until we get funding.

In the meantime, we are trying to get the RAG up and running, but we ran into issues with the model's latency and some of the RAG prompting that needs to be added to what we have right now so that it's able to properly access the data. I've been working on that. I think I'm going to have to redo the embedding into the vector database because our backend engineer followed a somewhat ham-fisted approach, which works for testing purposes but it's not necessarily robust.

However, a lot of my time last week was also dedicated to studying for a technical interview I had on Friday. I took a lot of time just watching videos on DSA and system design and taking notes for Leetcode exercises. Thankfully, I did pass the interview on Friday, so I'm looking forward to this new opportunity. I also had 4 other interviews/screeners, so I prepared for those.

Otherwise, I don't really have much to report. Hopefully more for next week. Cheers.

2024-04-15: Machine learning

Arman Tarkhanian — Tue, 16 Apr 2024 16:02:12 +0000

Last week was a fairly slow week. Two of the main psychologists were out doing personal things so I was basically on my own.

One of the things that needed to get done was improvement on one of the prompts for generating the second part of the app, but obviously the psychology is not my responsibility or domain so I naturally had to wait for an opportune time to meet with them once they came back.

In the meanwhile, I spent a lot of my time learning more about LLMs and ML. For LLM research, I was following DeepLearning.ai's new course on parsing unstructured data. For ML, I was following freeCodeCamp's course. Both have been teaching me a lot thus far.

I also had two interviews but nothing super interesting to report in that regard.

Friday I was met with a bug from our marketer who said that past message history was carrying over to a new chat, and though I wasn't able to reproduce her issue with the raw code, I recognized that it was probably due to some kind of latency on the actual website that may present itself due to the online nature of things vs. my running it on my local machine which has near-instantaneous connections. My hunch was confirmed when I introduced some artificial latency and I started chatting before the chat history would actually get cleared in the memory.

I got to work implementing some thread locks so that the chat can't be accessed if the chat history is still being cleared. It was a mild pain because the syntax for it is a bit strange in Python, but I eventually got it working flawlessly and even started with adding some logging for important data flow points in the code, which will be useful for the future.

Anyway, that's kinda it for last week. Hopefully I have some more interesting stuff to write for next week. Cheers.

2024-04-08: Meetings galore

Arman Tarkhanian — Tue, 09 Apr 2024 19:48:58 +0000

Truth be told, this last week wasn't super interesting. We are kind of in a lull at the moment due to waiting for the funds to roll in from investors, and the CEO is closing some deals, so we're kind of just feeling around for our plans for the future once he can become more involved again.

Last week started off with my generation of 3 different conversations that had a varying amount of pre-included example questions that the bot would draw "inspiration" from. One of the quirks of these LLM systems is that they tend to take the path of least resistance and will start quoting examples verbatim instead of just using them as a springboard, and it's very hard to circumvent this behavior without actually just getting in and fine-tuning it.

So my goal was to seeing how it responded to having zero example questions vs. the original 40 that the CEO had proposed vs. the 10 I'd whittled it down to. I think it gave a good insight as to how much freedom we want the bot to have. It seemed like it was going off on tangents more when there were no examples, but 40 felt too rigid. 10 is probably the sweet spot, even though it still struggles to come up with questions outside of the 10. Perhaps I could whittle them down even further.

Either way, beyond that, I spent my time learning more about machine learning. There's a good course by freeCodeCamp and likewise 3blue1brown has been releasing a series on Deep Learning revolving specifically around LLMs and it's very intriguing seeing all the math play out. I learned a lot from that. I'm also figuring out the best approach to how we want to feed in the fine-tuning data.

Thursday I met up with all three of the psychologists to discuss architecture again and also bring the newest one into the loop as to what we're going to be working on. I told him specifically that we want to curate some training data for conversations for the AI and his help as a specialist in what we're doing is perfect for that.

I read some articles on prompt engineering that were pretty useful and I incorporated some of the tips into the chat prompt and it worked better.

The next day I set out to work on making a prompt to generate a bunch of example conversations to work off of, kind of like I'd done at Lawgoat, but this time for the purpose of imbuing the language we desire. It's going to be a slog but it's important.

And that's pretty much all that happened last week. Till next time, cheers.

2024-04-02: Prompt optimizations

Arman Tarkhanian — Tue, 02 Apr 2024 17:27:11 +0000

Hey guys. So yesterday was a holiday, and I requested the day off so I'm writing this blog post for today.

Anyway, last week was pretty productive. My job as of late has been to optimize the prompts along with the two psychologists so that we can present it in acceptable form to investors.

Monday was a continuation of our Friday meeting. We met with the CEO to bring forth our ideas, and he thought it was all good. We explained our plans for the architecture and what we want to do with the fine-tuning, RAG, etc. In that regard, it was pretty fruitful.

The next day we continued with our planning, and I met with the CEO in one-on-one to showcase my thoughts directly, and also discuss compensation. He did say that he may be able to reach what I'm asking for, but time will tell how successful the funding will be. I also discussed with him about organizational optimizations because I felt that there was quite literally no structure to how we're working and everything feels somewhat nebulous, especially with how cagey the other developers are. They're extremely reluctant to share anything for whatever reason. Maybe it's a cultural/lingual barrier, because they are from Germany, but either way, it's not pleasant at all. I tend to thrive in environments where communication flows freely.

On Wednesday, I had the technical interview with Zillow. I had been somewhat preparing by doing Leetcode problems, since I'd read online that they mostly just ask mediums. I will admit that Leetcode is not a strong suit of mine, and I definitely do not believe it's an accurate representation of coding skill or problem solving ability. I certainly cannot work that fast. And plus, for most people, it's just a matter of memorizing the patterns. Either way, when I started the interview, I was met with a very easy string parsing problem, which was definitely not a Leetcode medium.

I got stuck with edge cases, so I didn't get to pass all the test cases, but I did well enough, in my opinion. After that was a system architecture question as to how I'd implement the previous problem. I also had my architecture for that pretty fleshed out, but I somehow forgot that queuing services exist (mostly because I've never had a reason to use them in my work), so I might have gotten dinged for that.

Regardless, the next day, I was met with a rejection. Which is fine, I am always open to doing interviews even just as practice and to learn more.

Thursday I went ahead and implemented Geekbot workflows myself because I didn't want to keep waiting on the lead dev to finally get it done. I also talked to the psychologists about what they'd think about me becoming the Product Owner in terms of Agile, and they were definitely receptive to the idea. I'd already somewhat squeezed myself into that role since I was the one constantly talking to the non-technical folks as a developer myself, so it wasn't much of a leap. I really like organizing tasks and keeping a smooth operation.

I then conferred to the CEO again, and he said it sounds like a good idea but we should wait until we can get a good version of the prompt up and running before we start reorganizing the roles and whatnot. I figured that that was reasonable, so that's what I continued to work on.

The backend developer brought to my attention that the chat history was being shared among every single client, which was bad for obvious reasons, and the chat history was also carrying over between chats. I immediately got to work on that.

I thought of keeping a running dictionary with user IDs as keys with the message history as values, and that worked out pretty fine. However, the bigger issue was getting it to clear out the message history when a new chat started. This involved me going into frontend to change the buttons' functionality and pass along the user ID, and then creating a new Django backend endpoint to handle the button clicks and delete it.

It was pretty rough to do considering all the typing that had to take place, but eventually I got it all working fine and it came out good. I then pushed it out and got to work on pushing out some prompt changes. The previous night I found a cool Reddit post about how to fix up the language that ChatGPT uses, so I used some of the suggestions there (particularly about using Flesch reading scores to make it simple but not simplistic), and it came out pretty decent. The major issue left was the annoying "mirroring" that ChatGPT tends to do when it acknowledges someone's input, e.g. "I like turtles" to which the bot replies, "It's great that you like turtles! What do you like about them?" The bold part is what we want to avoid.

We've been working on culling that behavior but it's been a hassle since it's so baked in to the system. No matter what we say in the prompt, it ignores it. We may need to resort to fine-tuning, but it's hard to come up with data at this stage. We have to write it manually which sucks.

On Friday, I was determined to keep working on the prompt with one of the psychologists (the other called in sick) but I was hit with a sinus headache myself which forced me to also call in sick.

Anyway, that's a recap of this past week. Until next time, cheers.