We will demonstrate in this tutorial how to leverage ScrapeGraphGemini AI combined with’s scraping tools automates the collection, analysis, and parsing of competitor information. ScrapeGraph’s SmartScraperTool (and MarkdownifyTool) allows users to get detailed information about competitor offerings, pricing strategy, technology stacks, market presence, etc. by using their websites. Gemini’s advanced linguistic model is used to combine these data points and create structured intelligence. ScrapeGraph’s process ensures the extraction of raw data is accurate and scalable. This allows analysts to concentrate on the strategic interpretation, rather than the manual data collection.
%pip install --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn
We quietly upgrade or install the latest versions of essential libraries, including langchain-scrapegraph for advanced web scraping and langchain-google-genai for integrating Gemini AI, as well as data analysis tools such as pandas, matplotlib, and seaborn, to ensure your environment is ready for seamless competitive intelligence workflows.
import getpass
Import os
Download json
import pandas as pd
Import List, Dict or Any
Datetime can be imported from another datetime
Matplotlib.pyplot can be imported as a plt
Buy Seaborn as sns
The import of essential Python libraries allows us to set up a safe, data-driven pipeline. getpass manages passwords and variables in the environment, json serializes data and pandas provides powerful DataFrame operation. Datetime stores timestamps, and the typing module offers type-hints to improve code clarity. Matplotlib.pyplot, and Seaborn provide us with the tools to create insightful visualizations.
If not os.environ.get"SGAI_API_KEY"):
os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:n")
If not, os.environ.get ("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:n")
The script checks if the SGAI_API_KEY or GOOGLE_API_KEY variables have already been set. If not, it prompts users to enter their ScrapeGraph API key and Google (Gemini), API keys securely via getpass. These are then stored in the environment and used for future authentication requests.
from langchain_scrapegraph.tools import (
SmartScraperTool,
SearchScraperTool,
MarkdownifyTool,
GetCreditsTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain
from langchain_core.output_parsers import JsonOutputParser
SmartScraper Tool = smartscraper()
searchscraper = SearchScraperTool()
Markdownify Tool = markdownify()
GetCreditsTool()
llm = ChatGoogleGenerativeAI(
model="gemini-1.5-flash",
temperature=0.1,
convert_system_message_to_human=True
)
Here, we import and instantiate ScrapeGraph tools, the SmartScraperTool, SearchScraperTool, MarkdownifyTool, and GetCreditsTool, for extracting and processing web data, then configure the ChatGoogleGenerativeAI with the “gemini-1.5-flash” We use the low temperature model and system messages that are readable by humans to guide our analysis. To structure the prompts and to analyze model outputs, we also use ChatPromptTemplate (from langchain_core), RunnableConfig(from langchain_core), chain, and JsonOutputParser.
class CompetitiveAnalyzer:
def __init__(self):
self.results = []
self.analysis_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
def scrape_competitor_data(self, url: str, company_name: str = None) -> Dict[str, Any]:
"""Scrape comprehensive data from a competitor website"""
extraction_prompt = """
Extract the following information:
1. Name of the company and its tagline
2. Products/services that are offered
3. If available, pricing information
4. Target audience/market
5. Benefits and features highlighted
6. Technology stack mentioned
7. Contact Information
8. Social media presence
9. News or Announcements Recent
10. Team size indicators
11. Funding information (if mentioned)
12. Case studies or testimonials from customers
13. Partner information
14. Geographical presence/markets serviced
Information is returned as a JSON-formatted structured data with a clearly defined categorization.
Mark the information as "Not available" if it is unavailable.
"""
try:
result = smartscraper.invoke({
"user_prompt": extraction_prompt,
"website_url": url,
})
markdown_content = markdownify.invoke({"website_url": url})
competitor_data = {
"company_name"Name of company or "Unknown",
"url": url,
"scraped_data": result,
"markdown_length": len(markdown_content),
"analysis_date": self.analysis_timestamp,
"success": True,
"error"No
}
Return competitor_data
Except Exception As e.Return
return {
"company_name"Name of company or "Unknown",
"url": url,
"scraped_data": None,
"error": str(e),
"success": False,
"analysis_date": self.analysis_timestamp
}
def analyze_competitor_landscape(self, competitors: List[Dict[str, str]]) -> Dict[str, Any]:
"""Analyze multiple competitors and generate insights"""
print(f"đ Starting competitive analysis for {len(competitors)} companies...")
for i, competitor in enumerate(competitors, 1):
print(f"đ Analyzing {competitor['name']} ({i}/{len(competitors)})...")
data = self.scrape_competitor_data(
Competitor['url'],
Due to the high demand for qualified candidates,['name']
)
self.results.append(data)
analysis_prompt = ChatPromptTemplate.from_messages([
("system", """
You are a senior business analyst specializing in competitive intelligence.
Analyze the scraped competitor data and provide comprehensive insights including:
1. Market positioning analysis
2. Pricing strategy comparison
3. Feature gap analysis
4. Target audience overlap
5. Technology differentiation
6. Market opportunities
7. Competitive threats
8. Strategic recommendations
Provide actionable insights in JSON format with clear categories and recommendations.
"""),
("human", "Analyze this competitive data: {competitor_data}")
])
clean_data = []
For result in oneself.
If result['success']:
clean_data.append({
'Company': result['company_name'],
'url:' resultReturn['url'],
'data': result['scraped_data']
})
analysis_chain = analysis_prompt | llm | JsonOutputParser()
try:
competitive_analysis = analysis_chain.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
except:
analysis_chain_text = analysis_prompt | llm
competitive_analysis = analysis_chain_text.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
return {
"analysis": competitive_analysis,
"raw_data": self.results,
"summary_stats": self.generate_summary_stats()
}
def generate_summary_stats(self) -> Dict[str, Any]:
"""Generate summary statistics from the analysis"""
If r is a positive number, then successful_scrapes equals sum(1.Return['success'])
failed_scrapes = len(self.results) - successful_scrapes
return {
"total_companies_analyzed": len(self.results),
"successful_scrapes": successful_scrapes,
"failed_scrapes""ailed_scrapes,
"success_rate": f"{(successful_scrapes/len(self.results)*100):.1f}%" if self.results else "0%",
"analysis_timestamp": self.analysis_timestamp
}
Export results by def (self, filename str = None).
"""Export results to JSON and CSV files"""
Filename if it does not exist:
Filename = F"competitive_analysis_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
With open(f)"{filename}.json""w" as f
json.dump({
"results": self.results,
"summary": self.generate_summary_stats()
}, f, indent=2)
df_data = []
For result in oneself.
If result['success']:
df_data.append({
Results for 'Company:'['company_name'],
"URL": results['url'],
"Success": the result['success'],
'Data_Length': len(str(result['scraped_data']If result, then () is the condition['scraped_data'] Other than 0,
Results of 'Analysis_Date:'['analysis_date']
})
if df_data:
df = pd.DataFrame(df_data)
df.to_csv(f"{filename}.csv", index=False)
print(f"â
Results exported to {filename}.json and {filename}.csv")
CompetitiveAnalyzer is a class that orchestrates the entire competitor research. It scrapes detailed information about companies using ScrapeGraph and cleans and compiles it. Then, Gemini AI generates structured competitive insight. The tool also keeps track of success rates, timestamps and offers utility methods for exporting raw data and summarised data to CSV and JSON formats.
def run_ai_saas_analysis():
"""Run a comprehensive analysis of AI/SaaS competitors"""
analyzer = CompetitiveAnalyzer()
ai_saas_competitors = [
{"name": "OpenAI", "url": "https://openai.com"},
{"name": "Anthropic", "url": "https://anthropic.com"},
{"name": "Hugging Face", "url": "https://huggingface.co"},
{"name": "Cohere", "url": "https://cohere.ai"},
{"name": "Scale AI", "url": "https://scale.com"},
]
results = analyzer.analyze_competitor_landscape(ai_saas_competitors)
print("n" + "="*80)
print("đŻ COMPETITIVE ANALYSIS RESULTS")
print("="*80)
print(f"nđ Summary Statistics:")
stats = results['summary_stats']
Stats.items for the key value():
print(f" {key.replace('_', ' ').title()}: {value}")
print(f"nđ Strategic Analysis:")
If isinstance (results)['analysis'], dict):
Content in section['analysis'].items():
print(f"n {section.replace('_', ' ').title()}:")
if content, list is instance:
The following is the content of this article:
print(f" ⢠{item}")
else:
print(f" {content}")
else:
print(results['analysis'])
analyzer.export_results("ai_saas_competitive_analysis")
Return results
This function starts the analysis of the competition by launching CompetitiveAnalyzer, and then defining which key AI/SaaS providers are to be evaluated. It then runs the full scraping-and-insights workflow, prints formatted summary statistics and strategic findings, and finally exports the detailed results to JSON and CSV for further use.
def run_ecommerce_analysis():
"""Analyze e-commerce platform competitors"""
analyzer = CompetitiveAnalyzer()
ecommerce_competitors = [
{"name": "Shopify", "url": "https://shopify.com"},
{"name": "WooCommerce", "url": "https://woocommerce.com"},
{"name": "BigCommerce", "url": "https://bigcommerce.com"},
{"name": "Magento", "url": "https://magento.com"},
]
results = analyzer.analyze_competitor_landscape(ecommerce_competitors)
analyzer.export_results("ecommerce_competitive_analysis")
Return results
This function creates a CompetitiveAnalyzer that evaluates major ecommerce platforms. It does this by scraping data from the sites, creating strategic insights and exporting them to CSV and JSON files with the name. “ecommerce_competitive_analysis.”
@chain
def social_media_monitoring_chain(company_urls: List[str], config: RunnableConfig):
"""Monitor social media presence and engagement strategies of competitors"""
social_media_prompt = ChatPromptTemplate.from_messages([
("system", """
You are a social media strategist. Analyze the social media presence and strategies
of these companies. Focus on:
1. Platform presence (LinkedIn, Twitter, Instagram, etc.)
2. Content strategy patterns
3. Engagement tactics
4. Community building approaches
5. Brand voice and messaging
6. Posting frequency and timing
Provide actionable insights for improving social media strategy.
"""),
("human", "Analyze social media data for: {urls}")
])
social_data = []
For urls in company_urls
try:
result = smartscraper.invoke({
"user_prompt": "Extract all social media links, community engagement features, and social proof elements",
"website_url": url,
})
social_data.append({"url": url, "social_data": result})
Except Exception As e."analysis = chain.invoke"Return
social_data.append({"url": url, "error": str(e)})
chain = social_media_prompt | llm
analysis = chain.invoke({"urls": json.dumps(social_data, indent=2)}, config=config)
return {
"social_analysis": analysis,
"raw_social_data": social_data
}
The chained functions define a pipeline for gathering and analyzing competitors’ social footprints. They use ScrapeGraphâs smart scraper, which extracts social media links, engagements, etc., before feeding the data to Gemini and prompting them to focus on content strategy, community tactics, presence and other relevant factors. It returns the AI generated social media insights and raw scraped data in one structured output.
def check_credits():
"""Check available credits"""
try:
credits_info = credits.invoke({})
print(f"đł Available Credits: {credits_info}")
Return Credits_info
"except Exception" is e.
print(f"â ď¸ Could not check credits: {e}")
Return No
The GetCreditsTool function above retrieves and displays your ScrapeGraph/Gemini API available credits. If the check fails or the result is not as expected, the warning will be printed. Otherwise, the return value of the credit (or None in error) can also be returned.
If __name__ is equal to "__main__":
print("đ Advanced Competitive Analysis Tool with Gemini AI")
print("="*60)
check_credits()
print("nđ¤ Running AI/SaaS Competitive Analysis...")
ai_results = run_ai_saas_analysis()
run_additional = input("nâ Run e-commerce analysis as well? (y/n): ").lower().strip()
If run_additional is equal to 'y,' then:
print("nđ Running E-commerce Platform Analysis...")
ecom_results = run_ecommerce_analysis()
print("n⨠Analysis complete! Check the exported files for detailed results.")
The last piece of code is the entry point for the script: It prints a heading, verifies API credits and then starts the AI/SaaS competitive analysis (and optionally, e-commerce) before signaling all results are exported.
Integrating ScrapeGraphâs scraping capability with Gemini AI turns a time-consuming, competitive intelligence pipeline into a repeatable, efficient one. ScrapeGraph takes care of the heavy lifting, such as fetching web-based data and normalizing it. Gemini’s natural language processing turns this raw information into high-level recommendations. Businesses can quickly assess the market, find feature gaps and discover emerging opportunities. Automating these steps allows users to gain consistency and speed, while also having the freedom to extend their analyses to other markets or competitors as required.
Take a look at the Notebook on GitHub. This research is the work of researchers. Also, feel free to follow us on Twitter Don’t forget about our 95k+ ML SubReddit Subscribe now our Newsletter.
Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to harnessing Artificial Intelligence’s potential for the social good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

