The project assumes the role of a eCommerce database analyst for a newly launched eCommerce startup named Maven Fuzzy Factory. The project's primary focus is not only to develop advanced SQL queries, but to thoroughly emphasise the business logic and decision support behind it. This project is based on Advanced SQL: MySQL for Ecommerce & Web Analytics, a Udemy course by John Pauler. However, outside of the problem statement framework, everything in the project is done completely by me.
The project is aimed to simulate a realistic environment where business decisions are linked and evolve with time, rather than being a series of disconnected snapshots.
By accounting for the passage of time, the project demonstrates how decisions made in early stages shape the company’s overall growth trajectory. The timespan begins from March 2012.
This project is based on a series of problem statements related to an e-commerce startup's growth and operations. While the high-level business questions and raw datasets were provided as a framework, all SQL architecture, data querying & transformation logic, query building, and output verification were independently developed and executed solely by me. As an extension of the provided framework, I have learned the business context in which high-level business questions have been framed.
The project is based on a highly comprehensive eCommerce database built from scratch by John Pauler, member of the Maven Analytics team. Instead of working on random samples of data, the database is designed to closely mimic real-word databases specific to eCommerce startups and internet retailers that heavily rely on online stores to generate sales. The official link of the dataset can be visited to retrieve the database.
The database contains six related tables:
orders- contains purchase orders placed by customers; order might contain multiple itemsorder_items- can be linked to the orders table to get information about the number of items contained within a particular orderproducts- can be linked with order_items to get product level information for a placed orderorder_item_refunds- can be linked to the orders table to get information about refunds made on orders with customer complaintswebsite_sessions- helps identify the source of website traffic (via utm and related parameters) through which orders are being generatedwebsite_pageviews- contains information on pages of a website visited by a user; for a specific website session
The aim of the project is to help the management (i.e. CEO, Marketing Director and Website Manager) grow the eCommerce business and understand how to improve performance. Analysis is performed to optimize the business’ marketing channels, website, and product portfolio. All analyses are performed using the MavenFuzzyFactory database on MySQL Workbench.
The project is intended to perform BI analysis by querying the database, instead of working on database warehousing, building and maintenance.
The highlights mentioned for each section of the project can be verified by looking at the output tables.
Assisted the Marketing Director in optimizing paid marketing channels using SQL-based traffic source analysis:
- Analyzed UTM parameters to identify primary sources of website traffic, finding
gsearch non-brandas the dominant paid channel. - Evaluated conversion rates against a
4% CVR benchmarkto reveal that the paid traffic bids were not justified. - Performed weekly trend analysis to measure changes in paid traffic volume following bid adjustments.
- Identified a significant CVR gap between
desktopandmobileusers (3.73% v. 0.97% resp.), leading to increased bids ondesktop traffic. - Post-optimization,
desktop sessionsandoverall paid trafficvolume increased over time, shown via weekly trend analysis.
Analyzed on-site user behavior and conversion performance to identify content bottlenecks and optimize the purchase journey:
- Ranked most-viewed website pages by session volume to understand how users interact with site content, identifying
/homeas the dominant entry page. - Calculated homepage bounce rate (~60%) by isolating landing-page sessions, highlighting poor engagement for paid search traffic.
- Designed an A/B test for a new paid-search landing page
/lander-1, showing a lower bounce rate for/lander-1vs./home. - Conducted weekly trend analysis to track traffic re-routing from
/hometo/lander-1, demonstrating a drop in overall bounce rate from ~60% to ~50%. - Built a full conversion funnel from landing page to order confirmation, quantifying user drop-off at each step of the purchase journey.
- Identified key funnel leakage points (
/lander-1,/products, and/billingpages) using step-level click-through rates. - Performed an A/B test on billing page redesign (
/billingvs/billing-2), showing a significant lift in billing-to-order conversion (≈45% → 62%).
Analyzed how the business is doing in the first 8 months of its operations, to help the CEO prepare for the board meeting:
gsearchsessions and orders increased steadily month over month, confirming it as the primary growth channel.- Rising
brandbased sessions and orders fromgsearchtraffic indicate increasing user awareness and intent. desktoptraffic generates most sessions and orders, butmobilesessions and orders also increased over time despite bid increases applied only todesktop.- Growth in
direct-sourcesessions indicates increasing organic brand recognition where the business doesn't have to pay to attract users. - Session to order
conversion ratein Nov 2012 is 38% higher than in Mar 2012, alongside 578% session growth and 835% order growth. /lander-1outperforms/homewith a 5% higher entry-pageclickthrough rate, within a seven stage conversion funnel.
Identified company's channel portfolio to comprise four marketing channels, namely, nonbrand traffic (paid), brand traffic (paid), direct traffic (unpaid) and organic traffic (unpaid). Analyzed channel portfolio to improve the effectiveness of the marketing budget:
- Within 3 months of launch
bsearch nonbrandweekly sessions reached ~40% ofgsearch nonbrandvolume, validatingbsearchas a meaningful secondary paid channel. mobileaccounts for 24.5% ofgsearchsessions but only 8.6% forbsearch, indicating thatbsearchis predominantlydesktopdriven and should not mirrorgsearchdevice strategy.gsearchoutperformsbsearchin terms of CVR on bothdesktop(4.5% vs 3.8%) andmobile(1.14% vs 0.8%), justifying reduction ofbsearchbids to avoid inefficient spend.- sessions from
brand,direct, andorganicchannels grew both absolute terms & from ~0.5% to ~7% ofnonbrandvolume (in % terms), suggesting increased brand awareness and reduced dependency on paid viewer acquisition
