<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: BAH123</title>
    <description>The latest articles on Forem by BAH123 (@bah123).</description>
    <link>https://forem.com/bah123</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2707292%2F6f0e5259-8ddf-48ff-8cc7-dde15beac43e.jpg</url>
      <title>Forem: BAH123</title>
      <link>https://forem.com/bah123</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bah123"/>
    <language>en</language>
    <item>
      <title>How to Get Filtered Amazon Reviews into a Pandas DataFrame in Under 50 Lines of Python</title>
      <dc:creator>BAH123</dc:creator>
      <pubDate>Sun, 16 Nov 2025 06:34:47 +0000</pubDate>
      <link>https://forem.com/bah123/how-to-get-filtered-amazon-reviews-into-a-pandas-dataframe-in-under-50-lines-of-python-71o</link>
      <guid>https://forem.com/bah123/how-to-get-filtered-amazon-reviews-into-a-pandas-dataframe-in-under-50-lines-of-python-71o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3a3i3f24wz36g7k1umzo.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3a3i3f24wz36g7k1umzo.jpg" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
We've all been there.&lt;/p&gt;

&lt;p&gt;You need data from Amazon. You write a simple requests script. It works. Then... 403 Forbidden. CAPTCHA. IP Ban. You add proxies, User-Agent rotation. Next week, Amazon changes a CSS class, and your script breaks again.&lt;/p&gt;

&lt;p&gt;The truth is: maintaining scrapers is worse than writing them.&lt;/p&gt;

&lt;p&gt;In this tutorial, we're skipping all that pain. We're going to use an API-first approach. We're going to let a specialized API (a pre-built Apify Actor) handle the scraping hell, while we focus on the fun part: analyzing the data with Python and Pandas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Set Up Your Environment&lt;/strong&gt;&lt;br&gt;
First, let's install the libraries. We'll use apify-client to call the API and pandas for data handling.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install apify-client pandas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll also need an Apify account (the free tier is fine) to get your API token. You can find it in your account settings under "Integrations."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Define Your "Advanced Filter" Task&lt;/strong&gt;&lt;br&gt;
We don't want all reviews. That's noise. We want the "verified purchase" 1- and 2-star reviews to find a competitor's fatal flaw.&lt;/p&gt;

&lt;p&gt;To run this specific scrape, we'll call the &lt;a href="https://apify.com/delicious_zebu/amazon-reviews-scraper-with-advanced-filters" rel="noopener noreferrer"&gt;Amazon Reviews Scraper with Advanced Filters&lt;/a&gt;. Its value is that it accepts a detailed JSON input to handle this filtering on the server side.&lt;/p&gt;

&lt;p&gt;Here's the JSON "payload" we'll send:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "productAsins": ["B09JVCL7JR"], 
  "filterByStarRating": [1, 2],
  "filterByVerifiedPurchase": true,
  "minReviewLength": 50,
  "maxReviews": 100 
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(I'm using a popular earbud ASIN as an example. maxReviews: 100 is good practice to keep our test run fast.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Run the Actor &amp;amp; Fetch Data (The Python Script)&lt;/strong&gt;&lt;br&gt;
Now for the core code. We'll initialize the client, call the Actor, wait for it to finish, and pull the results into a list.&lt;/p&gt;

&lt;p&gt;This code does all the heavy lifting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import pandas as pd
from apify_client import ApifyClient

# Get your token from an environment variable (recommended)
# Or just paste it: apify_client = ApifyClient("YOUR_TOKEN")
APIFY_TOKEN = os.environ.get("APIFY_TOKEN")

if not APIFY_TOKEN:
    raise Exception("Please set the 'APIFY_TOKEN' environment variable")

# 1. Initialize the client
apify_client = ApifyClient(APIFY_TOKEN)

print("Running the Actor...")

# 2. Define our input payload
actor_input = {
  "productAsins": ["B08N5HRT9B"], # Example ASIN
  "filterByStarRating": [1, 2],
  "filterByVerifiedPurchase": true,
  "minReviewLength": 50,
  "maxReviews": 100 
}

# 3. Asynchronously call the Actor and wait for it to finish
run = apify_client.actor("delicious_zebu/amazon-reviews-scraper-with-advanced-filters").call(
    run_input=actor_input,
    wait_secs=120 # Wait a max of 2 minutes
)

print("Run finished. Fetching results...")

# 4. Get the results from the Actor's dataset
items = []
for item in apify_client.dataset(run["defaultDatasetId"]).iterate_items():
    items.append(item)

# 5. Load into a Pandas DataFrame
df = pd.DataFrame(items)

print(f"Successfully fetched {len(df)} reviews.")
print(df.head())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: [The Payoff] Local Analysis with Pandas&lt;/strong&gt;&lt;br&gt;
Just like that. No messing with headless browsers, no proxies, no parsing HTML. We now have a clean Pandas DataFrame.&lt;/p&gt;

&lt;p&gt;Now for the fun part. Let's analyze it instantly.&lt;/p&gt;

&lt;p&gt;Let's find out how many of these negative reviews mention "battery" or "connection" issues:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if 'reviewText' in df.columns and not df.empty:
    # Find reviews mentioning 'battery' or 'connection' issues
    keywords = ['battery', 'connection', 'disconnect', 'charge']

    # Build a regex pattern
    pattern = '|'.join(keywords)

    # Filter the DataFrame
    complaints_df = df[df['reviewText'].str.contains(pattern, case=False, na=False)]

    print(f"\nFound {len(complaints_df)} complaints out of {len(df)} total reviews mentioning: {keywords}")

    # Print a few examples
    for text in complaints_df['reviewText'].head(5):
        print(f"- {text[:150]}...") # Print a snippet
else:
    print("\n'reviewText' column not found or DataFrame is empty.")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Found 42 complaints out of 100 total reviews mentioning: ['battery', 'connection', 'disconnect', 'charge']

- The right earbud disconnects constantly. I've tried everything...
- Battery life is a joke, lasts maybe 2 hours instead of the 8 advertised...
- Love the sound, but the connection drops every 5 minutes...
- Won't hold a charge after 3 weeks...
- ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a few lines of Pandas, we've zeroed in on this competitor's potential fatal flaw: connection and battery issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
What did we just do?&lt;/p&gt;

&lt;p&gt;We built a reproducible, reliable data pipeline in under 50 lines of Python. We completely skipped the fragile "scraper dev &amp;amp; maintenance" cycle.&lt;/p&gt;

&lt;p&gt;By offloading the scraping task to a specialized API endpoint (the Actor we used today), we saved ourselves weeks of dev and maintenance time, allowing us to focus on what actually matters: analyzing the data.&lt;/p&gt;

&lt;p&gt;You can find this &lt;a href="https://apify.com/delicious_zebu/amazon-reviews-scraper-with-advanced-filters" rel="noopener noreferrer"&gt;Amazon Reviews Scraper with Advanced Filters&lt;/a&gt; on the Apify Store. Happy coding!&lt;/p&gt;

</description>
      <category>python</category>
      <category>scraper</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
