Forem: Hilal Eylul

10 Things You Need to Know about DeepSeek R1 (As an ML Engineer)

Hilal Eylul — Sun, 02 Feb 2025 10:12:50 +0000

At first, it seems like I’m a bit late to the party.

DeepSeek R1 was introduced this past month as an LLM that rivals state-of-the-art LLMs from companies like OpenAI.

Needless to say, DeepSeek R1 has been getting a lot of attention recently.

Over the past two years, we’ve seen a hyper-obsession with LLMs, after OpenAI released the chatbot known as ChatGPT in late 2022. Ever since, there has been a very publicized race to release the fastest and most accurate model. Of course, the players were some of the biggest companies and research centers in the world.

Now, DeepSeek R1 is a model that rivals and even sometimes outperforms the state-of-the-art models we’ve seen so far. And that’s not even why DeepSeek R1 is so exceptional compared to all the other state-of-the-art LLMs. DeepSeek R1 is revolutionary because its algorithm and its open source release are integral to AI’s democratization.

That’s why every ML/AI professional needs to become familiar with DeepSeek R1 - both its inner workings and the potential effects it has on the future of AI.

As a starter, here are ten things every ML Engineer needs to know about DeepSeek R1.

Reinforcement Learning is a game changer for LLMs.

Previous state-of-the art LLMs have also incorporated reinforcement learning into their algorithms. However, the paper that introduced DeepSeek R1 showed that pure reinforcement learning can greatly improve the model’s reasoning capabilities over a period of time. Meanwhile, the traditional method for training LLMs relies on reinforcement learning from human feedback (RLHF).

Why is pure reinforcement learning any better?

The paper that introduced DeepSeek R1 addressed a major challenge in AI. Specifically, that challenge is to train the models without having to rely on large datasets consisting of labeled reasoning examples. The model that only used reinforcement learning proved that extensive supervised training is no longer necessary.

After all, large datasets with the right information are difficult to obtain and expensive to compile. Why not build a model that you can teach over time as you gain access to more information? Reasoning through reinforcement learning is a huge contribution which can help researchers build flexible LLMs that learn more on demand.

There is another major difference between a model that uses pure reinforcement learning and one that uses RLHF. The paper compares the results of two variations of the former to those of two OpenAI-o1 models. A line graph shows the accuracy of all four models over a period of time.

The results from the OpenAI-o1 models are static. At first, those two models greatly outperform the two models developed by the DeepSeek researchers.

However, the two models that only use reinforcement learning are dynamic. They get better over time, and eventually rival or even beat the OpenAI-o1 models. If given more time, they will possibly surpass the two models that performed better initially.

It has an invisible sibling known as DeepSeek R1-Zero.

Did you know that the DeepSeek R1 paper actually introduces two models? In addition to the model with which we are familiar, R1, it introduces another model known as R1-Zero. The paper establishes that this model is truly the backbone of R1.

There are several major differences between R1-Zero and R1. The former uses pure reinforcement learning, which is the strategy that we talked about in the previous section. Yes, it was actually R1-Zero that outperformed the OpenAI-o1 models and proved that reinforcement learning can remove the need for large datasets.

R1 does not use pure reinforcement learning, but it doesn’t rely on huge datasets either.

The training pipeline for R1-Zero is very simple. The first step is unsupervised pretraining. Then the next and last step is GRPO, which is an optimization process that improves the model through reward signals.

For R1, the training pipeline is similar but of course more complex, incorporating cold start finetuning and supervised finetuning (SFT). Cold start finetuning trains on a few thousand examples of reasoning problems, some of which R1-Zero generated and filtered. Supervised finetuning presents the model with training examples in the form of prompt and correct completion.

Clearly the researchers went through a lot of effort to build the R1 model. Why was R1 even necessary, especially considering that R1-Zero demonstrates very high performance and learning ability?

The reason is that even though R1-Zero is capable of eventually learning how to answer questions, it actually isn’t usable. For example, the model struggles with poor readability and it even mixes languages. Thus, R1 was introduced because it is meant to be a more usable model.

Nonetheless, R1 and R1-Zero are both publicly available as open source models.

DeepSeek R1 uses GRPO, which improves upon the PPO algorithm used by OpenAI.

It introduces Group Relative Policy Optimization (GRPO) in order to score how well the model responds to a question without having the correct answer. This is an improvement upon Proximal Policy Optimization (PPO), an implementation in RLHF, which OpenAI used for its models. PPO is a method that trains LLMs by using reward signals with the goal of improving results.

For the record, GRPO also aims to improve a model through reward signals.

A key component of PPO is something called a value model. The challenge in RLHF is that the model only sees the reward after evaluating the full text, but it needs feedback for each token it generates. The value model addresses this challenge by learning to predict future rewards at each token position.

However, GRPO gets rid of the value model while still ensuring that its implementation of reinforcement learning is highly effective.

Rather than using a value model, GRPO does the following. It returns multiple rewards for each query instead of one reward as done by PPO. Then it returns for each reward a value given by the advantage function which measures how much the reward has changed based on the average reward.

For example, if you get n number of outputs for a math problem, and one solution has a reward of 0.9 while the group average is 0.7, then that solution would get a positive advantage of 0.2.

GRPO ensures a natural method for determining if an output to a query is better or worse than average. That way, you don’t have to go through the trouble of training a value model to make predictions about future rewards.

Chain of thought greatly reduces the chances of the model making mistakes.

Chain of thought (CoT) is one of the three main concepts in the DeepSeek R1 paper, along with reinforcement learning and model distillation. It can be defined as the step during which the model reasons before presenting the solution.

The concept is also highly relevant to prompt engineering. We can ask the model to essentially think out loud. To at least some extent, the model will return an explanation of its reasoning before reaching a conclusion.

The R1-Zero version of DeepSeek actually reveals its chain of thought in its answer, as shown in the study. Other similarly performing models, such as the OpenAI models, don’t do this.

There is a major advantage to this type of response. If the model makes a mistake, we can easily pinpoint where in its reasoning it was incorrect. Then, we can re-prompt the model so that it doesn’t make the same mistake again.

As a result, the model will have a more accurate response than if you just let it give the response by itself without chain of thought reasoning. Ultimately, this can lead to LLMs giving more accurate responses and even avoiding hallucination.

A variety of posts on Twitter and YouTube have shown that DeepSeek R1 can even solve the infamous “strawberry” problem.

A major priority is for the model to maximize its reward, while placing a limit on the change in its policy.

As we discussed in an earlier section, the optimization process is given by PPO or GRPO. If you recall, GRPO is the optimization process in the DeepSeek R1 study.

Within the optimization process, there is a policy model. It is the policy model that takes the query and generates one output for PPO and multiple outputs for GRPO.

The model’s policy is defined as how the model behaves. In reinforcement learning, the goal is to optimize the model’s policy while training the model. Optimizing the model’s policy means maximizing its reward.

The idea of a model gaining more information is akin to a sentient being exploring its environment. Over time, the model learns which policies maximize the reward. Then it determines its policy according to that.

For example, consider that there may be two ways to solve an equation. However, one solution can be attained much more quickly and easily than the other solution. Thus, the quicker and easier solution has a much higher reward.

The GRPO optimization process in the DeepSeek R1 study is given by an equation. If you look at the equation in the paper, the π variable represents the policy. Because we want to optimize the policy, we essentially want to change it so the model can yield better answers.

However, we don’t want to change the policy too much. This is because that can cause a lot of instability with model training. The goal is for our model is to be as stable as possible, and avoid a roller coaster of policy changes.

This is when clipping becomes relevant. Clipping restricts how much our policy can change by 1 - ε and 1 + ε. This clipping function also appears in the study’s main equation, which uses GRPO in order to score the model’s response to a given query without having the correct answer.

Unlike the o1 model by OpenAI, DeepSeek R1 reveals what happens between the `<think>` tags.

For those familiar with HTML and other markup languages, the <think> tag will be especially easy to understand. The user experience for an LLM prompt and response looks something like this: the user asks a question, the LLM thinks of a response and encloses it in the <think> tags, the LLM directly answers the question.

The chain of thought which we mentioned in an earlier section is the content inside of the <think> tags. Depending on the complexity of the task, the chain of thought can be quite long.

The o1 model by OpenAI no doubt uses chain of thought to answer challenging questions, including coding tasks and Math Olympiad problems. However, OpenAI never disclosed what happens within the <think> tags.

The DeepSeek R1 model and its variations are different. The paper shows an example of the R1-Zero model’s chain of thought when it answers a math question. The model even demonstrates an “aha moment” which shows that it can re-evaluate its thought process in an anthropomorphic tone.

The model can outperform state-of-the-art models like GPT-4o and Claude 3.5 Sonnet at a small fraction of the memory and storage.

This is where we get into the third main concept in the paper, after reinforcement learning and chain of thought. Model distillation involves training a smaller model (the student) to behave similarly as a larger model (the teacher).

The advantage of model distillation is that it enables the student model to perform at a small fraction of the size. For example, DeepSeek R1 has 671 billion parameters. However, a distilled version of DeepSeek R1 has only 7 billion parameters. The goal here is to make the model even more accessible to people who don’t have a relatively high cost server (e.g. one that is worth ten thousand dollars).

In the experiments, the DeepSeek researchers found that the smaller distilled models largely outperform larger models like GPT-4o and Claude Sonnet 3.5. The distilled models excel in math, coding, and scientific reasoning tasks. However, the distilled models are accomplishing this at a small fraction of the memory and storage required to use them.

There are four reasons why the model runs so efficiently.

It is believed that DeepSeek made training 45x more efficient. There is an ongoing debate as to whether that’s even possible. However, there appear to be four reasons why DeepSeek R1 and its variations run so efficiently.

The first reason is that the training process involved using 8-bit instead of 32-bit floating numbers, which saved a great deal of memory.

The second reason is the model’s multi-token prediction system. Most Transformer-based LLMs carry out inference by predicting the next token - one token at a time. The quality of multi-token prediction turned out to be no worse than that of single-token prediction.

This approach appears to have doubled inference speed and achieved about 85-90% accuracy.

The third reason involves the compression of key-value indices, which eat up much of the VRAM. The key-value indices represent the individual tokens within the Transformer architecture. The model learns while compressing these indices in a way that captures the essential information and uses far less memory.

Thus, the DeepSeek R1 study shows that it's wasteful to store the full key-value indices, even though that’s what everyone else does.

The fourth reason is the incorporation of Mixture of Experts into the DeepSeek R1 architecture. Specifically, the model is a stack of 61 Transformer decoder blocks. The first three are dense, and the remaining 58 are Mixture of Experts layers.

Because the model’s architecture includes Mixture of Experts, it decomposes a large model into smaller models that can run on consumer-grade GPUs. Hence, the accessibility of the distilled models.

You can probably run a version of the model on your local machine(s), but it may be a challenge.

Several versions of DeepSeek R1 are available on Hugging Face.

DeepSeek R1 and DeepSeek R1-Zero are available, as I mentioned in an earlier section. The current version of each has 685 billion parameters.

There are also several distilled versions at 70 billion parameters, 32 billion parameters, 14 billion parameters, 8 billion parameters, 7 billion parameters, and 1.5 billion parameters. Of course, these are supposed to run on machines that are more affordable and accessible.

David Plummer, a retired software engineer formerly at Microsoft, talked about running DeepSeek R1 in a YouTube video. He said that the main version, which had 671 billion parameters at the time, can run on an AMD thread Ripper equipped with an NVIDIA RTX 6000 Ada GPU that has 48 GB of vram (total cost is around ten to fifteen thousand USD).

However, he also says that the 32 billion parameter distilled version runs nicely on a MacBook Pro.

There is an effort to re-build DeepSeek R1 and share it with the open source community.

Although the open source model itself is available online, the company never published the code and the datasets.

There is a project called Open-R1, and the goal is to replicate all parts of the DeepSeek R1 pipeline. It is a community driven effort to fully understand what kind of algorithms, code, and data are necessary to emulate the high performance of DeepSeek R1. The best part is that this is a building in public project, so all information will be publicly available.

Also, Open-R1 will likely eliminate the concern that your data might go to DeepSeek even when running the model locally.

In conclusion, DeepSeek R1 is not perfect. There are still concerns even in regards to running the model locally. However, I think this study will definitely pave the way for democratization and privacy in AI.

How to Kill It on LinkedIn without Making a Fool of Yourself

Hilal Eylul — Sun, 17 Nov 2024 13:07:55 +0000

We had youtubers on YouTube, we had “influencers” on Instagram, and by now we’ve got TikTokers on TikTok. But there is another breed of internet famous. It is becoming more and more prominent as of the early to mid-2020s.

Welcome to the age of the Linkedin influencer.

As the name suggests, the Linkedin influencer is a type of “influencer” on social media. Except the platform is Linkedin, which is a space for professionals to network. It was launched in 2003, the good old early days of the internet since before “I love this makeup palette, not sponsored!” was a thing.

That’s right. LinkedIn went from being a low-key website that had maintained the nostalgic utilitarianism of the early 2000s to just another brain rot propagating social media platform.

I don’t know when the term “LinkedIn influencer” was invented. Google Trends suggests that 2022 was when its usage saw an uptick in popularity.

Regardless, you can probably thank the Linkedin influencer for the surge of aggravating, demotivational, and overall cringe content on your main feed. Oh, and the fact that LinkedIn now promotes short form videos doesn’t help.

So how does the average person respond to this? The introvert, the low-key person, and the everyman? After all, LinkedIn is known as the place to be if you want to further your career.

I would argue that LinkedIn is still essential for anyone who wants to level up. I would even go as far as to argue that you should be active on the platform. It’s just that there is a way to go about it.

There is a fine line between actions that lead to killing it on LinkedIn and those that lead to making a fool of yourself on the platform. Oftentimes, only the former scenario yields desirable results.

If you want to optimize the time you devote to LinkedIn, you need to do these things.

Know that trying to become an influencer on LinkedIn is not worth it

Let’s get the first question out of the way. Is trying to become a Linkedin influencer even worth it? The answer is a surprising “probably not.”

Nonetheless, the idea prevails that posting often on LinkedIn and getting lots of views on your profile is the key to getting job offers. For a long time, I used to think that this was the case. That was until I recently came across a very specific post on LinkedIn.

The user was a data scientist who boasted “ex-Meta” and “ex-Amazon” on her profile and had over 67,000 followers. In her post, she said that “the ROI of posting on LinkedIn is so low.” This might actually have been what influenced me to write this article.

I guess a social media “influencer” does have the potential to influence people after all.

Jokes aside, that one sentence she wrote changed the way that I looked at the whole idea of becoming a LinkedIn influencer. Because the truth is, I had entertained the idea of trying to become one myself lately.

Why? Because like any other gold rush, lots of people are doing it. And the return on investment, at least for the people whom I came across, seems to have been much higher than the previously mentioned data scientist suggested.

As is with other domains, you don’t want to follow the crowd who is obsessed with trends. You want to start the trends. Because those who are around when something is just starting to become popular are usually the ones who end up having a reasonable chance of attaining success.

As far as I’m concerned, this whole “LinkedIn influencer” trend is no different.

Our biases get in the way of how we perceive reality. For every person who became super successful in her career after becoming a LinkedIn influencer, there are probably tons of other LinkedIn users who are not succeeding at all, wasting time, and making a fool of themselves in the process. Survivorship bias is what perfectly describes this type of situation.

Not only do I not enjoy gossip, but I try to avoid it as much as possible. However, I can only imagine the type of things coworkers and other acquaintances say about someone who is not benefiting from his or her attempts at becoming a Linkedin influencer.

“She’s trying to become one of those Linkedin influencers.”

“Just another try-hard who writes cringe posts instead of actually focusing on what’s important.”

“How sad.”

Now I’m definitely not suggesting that you need to live your life according to other people’s comments. Becoming an independent thinker is crucial to achieving self-actualization.

However, we still need to consider the effects that our actions have on others as we navigate our careers.

If the main focus of our careers is just to gain more attention and notoriety so we can have a chance at “faking it until we make it” to high salaries and lofty perks, then it’s not really worth it in my opinion.

Also, avoiding this “LinkedIn influencer” trend will probably result in less posts that spread negativity, hopelessness, and sheer annoyance. That itself, especially in this day and age, is a highly valuable contribution.

Realize early on that your network and followers alone won’t get you anywhere in this field

“Your network is your net worth.”

This is probably the one saying that any young professional has heard at least a few hundred times. Is it true? The short answer is yes, but at least in the field of computer science, this is only true if you are good at your job and are an effective communicator.

For some time, this was a bit difficult for me to believe. Especially considering some of the things I’ve thus far known about.

The industry is a meritocracy but also it isn’t.

I once came across a post on Linkedin by a user who worked as a machine learning engineer at a Fortune 500 company in the United States. To the untrained eye, the post seemed to contain very helpful advice for any ML/AI professional. As a mid-level machine learning engineer, I quickly realized from the contents of the post that this woman had a very weak understanding of her own field.

To make things even worse, this wasn’t just some employee who had stumbled upon the job. She wasn’t some internal hire who had previously worked under a different job title but in the same location. Rather, she was brought to the United States as foreign “talent” after having worked at a totally different company in her home country of India.

We hear this all the time, but I’ll say it again. The system is broken. These types of stories happen most likely because some workers in this field have others vouching for them due to an already existing connection.

The research community in computer science is another setting where neither dishonesty nor the act of gaming the system are strangers.

I used to frequent the Adobe Research website, and there was one researcher who stood out. This researcher, originally from western Europe and now working at one of the Adobe research centers in the United States, had published an unusually high number of papers in the past year alone. Needless to say, I was very impressed.

Not long after, I discovered Andy Stapleton, who is a youtuber based in Australia with a PhD in a STEM field. In one of his videos, he talked about “paper mafias” which are common in the scientific research community. Researchers join to ensure that their names appear on any paper that anyone else in the group has published, regardless of contributions or lack thereof.

These examples prove that “getting ahead” largely because of connections is a possibility. Yet, I would not consider any of these cases to be examples of genuine success because of the lack of integrity that is involved.

Here at MLE Resource, we don’t want to do anything that goes against our values nor engage in unethical behavior. However, it is still essential for us to connect with fellow professionals who work in the same field. After all, how would it be possible to grow in your career if no one else knows who you are?

The thing is, you want to be noticed by relevant people in your area of work. Your dream connections, if you will. Just keep in mind that the reason they notice you is equally important.

Getting noticed for existing by the right people is not the key to hacking your way into a higher paying job or obtaining more leads for your solopreneur gig. Getting noticed for your valuable contributions by the right people is what can lead to the opportunities you are looking for.

At the end of the day…

No one cares about your connections or how many followers you have if you cannot do the job.

Identify your talents and then focus on them

What are your talents? Moreover, what are your interests?

No, I am not just talking about your job title. I am not even talking about your tech stack or your area of focus.

Every job has multiple responsibilities and necessary skills. Coding, designing, keeping track of KPIs and progress, writing documentations, mentoring, giving presentations, maintaining rapport with partners and stakeholders, and more.

We all have our talents and interests. I would even argue that there is a correlation between those two things.

As you become more interested in something and work to improve yourself in that domain, you eventually end up offering more valuable contributions. This gives the impression that you are talented in that area. And it’s probably true, if you get to a point where your work is of high quality.

From early on, it is crucial to identify what you do best and what are the most impactful ways you can contribute to an organization, a project, and a team. When I say impact, I am not suggesting you have to make a change that people will remember and still be affected by 10 years later. What I mean is focus on what you can do so that you ensure reasonable success and help your team reach their goals.

A lot of people in this field think that more experience is correlated with more marketability. In other words, if you are a junior you will have a harder time finding a job and if you are a senior, job offers are fairly easy to attain.

As much of a travesty as it is, this does not appear to be the case. Companies are hiring less people, and those that are publicly hiring seem to demand “10x engineers” or “unicorns” for senior positions.

This is where the importance of focusing on your talents becomes especially handy. When it comes time to look for a new job, you don’t rely on the same old resume filled with the same old skills and tasks. There will be hundreds of other applicants claiming that they are no different. To stand out, you need to highlight how you, as proven by your abilities and past work, are crucial to the success of the organization.

Consider Chip Huyen, for example. Her claim to fame was teaching a course about using Tensorflow for deep learning when she was an undergrad student at Stanford. Based on her story, you can tell that the only reason she became the instructor was because she was highly interested in that topic and took the initiative to launch the course in spite of how difficult it was.

Although Huyen did actually work in AI research at some point, she eventually quit. Now, she is known as a world expert in machine learning system design. Rather than filling her resume with the same topics as everybody else, she decided to focus on the areas where she knew others would value her contributions.

If you want to find the people who will value your contributions, growing your network is essential.

Focus on growing your network

We talked about why it is important to grow your network. The next question is, how exactly do you go about doing that online?

LinkedIn is the perfect place to find people, although Twitter (which some people think is now called X) is at least as effective. Although Instagram has a small community of people interested in stuff like web development, ML/AI engineering, and cybersecurity, it mostly consists of influencers whose target audience are beginners and aspiring techies.

After we begin working on strengthening our talents, we want to reach out to our fellow professionals. The second part, reaching out to the right people, is a very common piece of advice you will come across when the topic is growing a professional network.

This advice is often followed with the insistence that you should reach out to others immediately through a direct message. At the time I write this article, LinkedIn only allows you to send one message to someone with whom you are not already connected. Even that is typically when you are sending a connection request.

Personally, I would not recommend this. Not only do social media platforms such as LinkedIn often discourage cold messages, but many users disable DMs and even connection requests from strangers. But that is not even the main reason I don’t recommend reaching out to others unsolicited.

The main reason is because it’s not very effective.

Think about it. Imagine you are at the top of your field, and you have numerous people reaching out to you because they want to add you to their network or even have a specific question. What are the odds that you would even remember any random person amongst them a few weeks later?

Not very high.

I would argue that reaching out to someone you don’t already know is a process. Here is how I would go about it.

First, I would follow the person on at least one of the two social media platforms (LinkedIn and Twitter), if not both.

Second, I would periodically check the person’s profile for interesting content. If I have anything interesting to say as a response to any post, I would submit a comment. It’s worth noting that you send replies if and only if you have something to say that you consider to be genuinely insightful and/or useful.

I would continue step 2 for at least a few months. Also, I would not move onto the next step until the person has somehow acknowledged me. Either through liking at least a few of my comments or sending at least a few positive replies.

Last, I would reach out to the person directly. That could include sending a connection request, sending a direct message introducing yourself and requesting a video call, or arranging to meet the person at a conference s/he recently announced s/he will be attending.

Maybe you want to collaborate on something, work at the same company, or just want the person to see some of your recent work because you thought s/he would be interested.

In my opinion, these steps make up the most effective way to go about connecting with others online.

Don’t rely too much on the advice of career gurus on social media

As I always say, there is no shortage of bad advice on the internet.

That includes career-related advice you might come across on social media or even personal blogs.

The best thing you can do is to take advice with a grain of salt.

When I first began my career, I had no idea how to reach out to more established professionals in my field. The thing was that I didn’t know that many people, and I wanted to expand my network. Because I had no idea what to do, I turned to the internet of all places for advice.

A common theme was that you needed to do whatever you could to get one-on-one attention.

One really terrible piece of advice that I recall specifically was to connect on LinkedIn with a stranger who also works in your field, and then ask that person to review your resume. This was especially recommended if the person worked at a company in which you were interested.

One reason that I remember this piece of advice so well was because I was guilty of actually trying it. Surprise, surprise, it didn’t work. The person that I had recently connected with went radio silent.

What must have been a year later, I came across another stranger on the same platform who wrote a post that appeared on my main feed. In the post, he ridiculed the people who follow this advice and implied that he ignored them as well. Specifically, “asking other people to do chores for you” is how he referred to following this advice.

What’s the takeaway? Don’t be too eager to take advice from people on the internet, even if they are known as career gurus.

Also, don’t allow yourself to be too easily influenced by your environment. Especially online.

Keep calm, and don’t give in to the hype

It can be very easy to be influenced by your environment. Especially when the people around you, or the people on your main feed on LinkedIn or Twitter, are all talking about the same topics.

If these people are talking about topics relevant to your field, then it is definitely worth paying attention and even publicly responding to those posts. Research in the field of AI is something that I am highly interested in, and it is very relevant to my career. As a result, posts about new papers related to AI research and emerging trends are pretty much always welcome on my main feed.

The same cannot be said about politics, debates about social norms, and even pop culture.

Why?

The first reason is because these topics are not relevant in any way to achieving our goals. Joining these discussions doesn’t help others see you for your talent, competence, nor achievements.

The second reason is because it is risky. When you begin involving yourself in political debates, you have an approximately 50% chance of alienating anyone else who can see your activity online.

Even publicly talking about something as seemingly innocuous as pop culture can lead your colleagues into seeing you as frivolous and lacking good judgment.

The third reason is because it is futile. What are the possible advantages of discussing these topics in an online professional setting?

If you want to bond with your coworkers and friends over these issues, you can do that privately. Meanwhile, if you are admittedly doing it just to get attention in the hopes of becoming internet famous, then you aren’t fully convinced why trying to become a social media influencer isn’t a good idea.

The other really important thing is to keep calm. Remember that Keep Calm meme from the early 2010s? Maybe not.

Either way, that is what you need to do on social media. Which is relatively easy if you don’t post often and engage in discussions on controversial issues.

The End

Wow, that was a behemoth of a blog post. In all honesty, I wasn’t expecting it to be this long. It wasn’t until I was halfway through writing that I realized how long the post might get.

Anyway, I hope this article was helpful. Normally, I write about topics related to ML/AI and software. Hopefully the upcoming article will not stray from those topics.

Until next week!

How to Focus on the Right Content when Learning AI/ML

Hilal Eylul — Sat, 09 Nov 2024 11:42:38 +0000

I began actively using Linkedin after I got my first job as a Machine Learning Engineer. Eager to learn about the field as much as possible, I began following more senior professionals who shared my enthusiasm. These people were working at a variety of companies, from FAANG companies to universities to growing startups to established companies that have been around for many decades.

Seeing their posts appear on my feed, I began compulsively liking them. And by that, I specifically mean clicking on the “thumbs up” button. That way, other people knew that I had liked a post, and they could go to my profile and see my history of liked posts on the platform.

It was like a curation of the type of content that I, Hilal Eylül, found to be interesting and worthwhile.

Even more importantly, I could go back to these posts later on. If any post seemed to contain information that I didn’t have the bandwidth to examine while scrolling, I could refer to it when I was in a more studious mood.

Not long after, I began to realize something.

Some of these posts were not actually very helpful.

This even includes the external sources that my fellow professionals on Linkedin mentioned and promoted in their posts. They were not the kind of sources or information that would in any way be vital on one’s path to becoming a more advanced professional in the field of AI and ML.

If anything, a lot of these posts were self-promotional. Either for the individual or for the company the person worked at.

That’s exactly why I decided to address this topic. These are the things you can do to filter out useless content on your journey to becoming a better AI/ML Engineer.

Learn how to spot covert ads

Let’s face it. A lot of things you see on the internet are secretly just ads.

This includes tweets, photos and videos posted on various social media platforms, newsletters, and even blog posts.

But in the field of ML/AI, it also extends to entire books and tutorials.

Lewis Tunstall’s book “Natural Language Processing with Transformers” very effectively promotes Hugging Face, the company where he works. The famous course “Practical Deep Learning for Coders” promotes fast.ai, which admittedly is a non-profit organization. Google has tons of “hands-on” courses for ML engineers where they essentially promote GCP.

This is not to say that none of these promotional materials are valuable. In addition to the course developed at fast.ai, Lewis Tunstall’s book is very informative and covers topics that are highly relevant in the industry.

But I would argue that the next time you come across educational material, ask yourself these two questions. First, is this a premier resource that can actually help me learn about topics that are frequently applied and sought after in the industry? Second, does the material lock you into being dependent on the advertised product such that you practically have to start from scratch if you want to move to an alternative?

The main idea is to identify from early on how you, and the people who are creating and promoting the content, will benefit.

Then there is content that is not necessarily sponsored or a native ad, but still promotional.

For example, it is not uncommon for someone to write a blog post or post a video on social media with a call to action. This is often innocuous.

It is not necessarily an attempt to instill false confidence or a feeling of productivity through a promotional post but just an attempt to build a stronger relationship with others. The call to action might simply be an invitation to join a newsletter. That way, it is easier to foster a relationship with the people who are interested in your content.

I also have a newsletter. There, I will be posting even more interesting stuff that I don’t post on this platform. The signup page is at mleresource.com.

Identify the right tech stack

Waylon Walker, a senior software engineer, wrote about the importance of focusing on the right topics and skills.

He references a social media post where another Linkedin user gives his own advice. That is, focus on Python and C++, avoid “dead” software such as Tensorflow, don’t spend much time with R because it won’t get you far, and also avoid languages that are “too academic” like Haskell and Julia.

I rarely use this word, but I’ll say it now. Baloney.

These various tools exist because there are people and projects and companies in the world that have a use for them. Tensorflow is still very much alive. R has a library that can help create beautiful data visualizations. If you’re really good at Haskell, you can potentially get a job at a major tech company and use it there.

Don’t chase after trends and buzzwords. Do identify a tech stack.

What are your interests? What types of projects do you want to work on? What projects have you seen and thought “hey, I wish I did that”? Hint: it doesn’t have to be something that everybody is talking about. It just has to be something that’s practical and something that you can see yourself doing.

Don’t think of your journey as something that involves learning everything that is relevant to AI or ML. That was the takeaway from Waylon’s blog post. But don’t feel pressured to learn something because it’s popular.

Identify what you want to do and what type of projects you want to work on. Figure out which skills you need to learn to fulfill that goal. Not the goal of optimizing your resume to match as many job descriptions as possible.

Next, focus on learning those skills.

Check before diving in

It’s quite simple.

Before you embark on a course or begin reading a book, don’t just make the decision to invest your time and money in that material based on reviews. Or even based on glowing reviews from your friends and colleagues.

Don’t just test the waters.

It is definitely worth looking over the material before you dive in. For example, consider skimming through the table of contents of that textbook or O’Reilley book that you’ve been considering for a long time.

Maybe it’s not as useful as you think. The material might have become popular because it is relevant to certain types of companies or projects. A book on, say, statistical methods for applications in finance will only be so helpful if you want to become an expert on computer vision or MLops.

Ask yourself: what is the end goal of studying this material? For example, it could be learning linear algebra concepts so you can become more comfortable with building deep learning models.

This can also apply for simple posts on apps like LinkedIn. What is the goal of this post, and how can it actually help me grow in my career?

Theory and concepts are more important than you think

In the field of software engineering, we love applying things. So much so that we usually disregard theory and concepts. And this is relevant to AI/ML because software engineering is a superset of ML engineering.

The internet is, thankfully, abundant with tutorials that can help us in various stages of the machine learning lifecycle. These tutorials can definitely be lifesavers when we need something figured out. Fellow software engineers, keep the tutorials coming!

The only issue is that many software engineers, and by extension ML engineers, often have a disregard for theory and concepts. So many software engineers scoff at “leetcode questions” and brag that they never had to learn how to solve them. These questions basically test for understanding of Data Structures and Algorithms.

Likewise, ML engineers focus primarily on the code. There seems to be more of an emphasis on getting the code to work than trying to learn the concepts. As a result, this compromises the quality of the algorithms and even the products being built.

It’s surprising how many self-identified software and ML engineers don’t even know what they’re doing.

What’s even more crucial is that once they get to a certain point, many people in this field begin to have a disregard for mathematics. Namely, topics such as statistics, linear algebra, and even calculus. If these topics were ever learned, they are quickly forgotten.

The advancements in technology, software engineering, and AI would not have been possible without math. Knowledge of this topic is still necessary if we want to contribute to this field.

Know when to get out of “tutorial purgatory”

Tutorial purgatory might as well be considered the bane of any junior software engineer’s career.

The sooner you get out, the better.

In essence, tutorials are great because they help get a beginner’s feet wet. Going through too many long form tutorials can result in fear. Eventually, it can get very difficult to get out of your comfort zone.

I would argue that tutorials are still helpful to me, a mid-level ML engineer. It’s just that what I find to be helpful when I am working on projects are short form tutorials and documentation. These only support the projects that I am working on.

Bonus tip: Don’t be afraid to waste time

You live and learn.

Ever heard of the 10,000 hour rule?

The more you focus on something, the better you get at it.

Now you might argue that the 10,000 hour rule is not based on science. Or you might point out that it is only helpful if it involves deep work and deliberate practice.

All of that is true. Deliberate practice certainly trumps more passive involvement.

The reality is that when you begin a journey, you will make mistakes. You will spend time focusing on things that you later wish were never priorities. There’s even a chance your areas of focus might shift.

And the latter might even happen multiple times.

By focusing on “the wrong things” you are still expanding your knowledge. And that knowledge might be helpful even after you become established in your career.

The End

That’s it for today! I really hope this post was helpful to everyone. Especially for those who were feeling a bit overwhelmed by this whole process and all the content out there.

Forem: Hilal Eylul

10 Things You Need to Know about DeepSeek R1 (As an ML Engineer)

Reinforcement Learning is a game changer for LLMs.

It has an invisible sibling known as DeepSeek R1-Zero.

DeepSeek R1 uses GRPO, which improves upon the PPO algorithm used by OpenAI.

Chain of thought greatly reduces the chances of the model making mistakes.

A major priority is for the model to maximize its reward, while placing a limit on the change in its policy.

Unlike the o1 model by OpenAI, DeepSeek R1 reveals what happens between the <think> tags.

The model can outperform state-of-the-art models like GPT-4o and Claude 3.5 Sonnet at a small fraction of the memory and storage.

There are four reasons why the model runs so efficiently.

You can probably run a version of the model on your local machine(s), but it may be a challenge.

There is an effort to re-build DeepSeek R1 and share it with the open source community.

How to Kill It on LinkedIn without Making a Fool of Yourself

Know that trying to become an influencer on LinkedIn is not worth it

Realize early on that your network and followers alone won’t get you anywhere in this field

Identify your talents and then focus on them

Focus on growing your network

Don’t rely too much on the advice of career gurus on social media

Keep calm, and don’t give in to the hype

The End

How to Focus on the Right Content when Learning AI/ML

Learn how to spot covert ads

Identify the right tech stack

Check before diving in

Theory and concepts are more important than you think

Know when to get out of “tutorial purgatory”

Bonus tip: Don’t be afraid to waste time

The End

Unlike the o1 model by OpenAI, DeepSeek R1 reveals what happens between the `<think>` tags.