Unveiling the True Costs: Cloud vs. On-Premises Data Warehouse Solutions

Pricing is a significant challenge in data warehousing. When choosing an architecture for the long term, it’s crucial to be aware of ongoing costs. Many individuals in the industry quickly adopt the latest technologies to alleviate technical frustrations, but this can lead to significant hidden monthly expenses. Our goal should be to preserve capital and spend only what is necessary. While there is no one-size-fits-all solution, aiming for more efficient outcomes is essential.

In today’s post, I am going to provide 2 use cases – one for a private business and the other for an enterprise.

The purpose of today’s post is not to provide an in-depth analysis of how each cloud provider operates but rather to give an overview of the costing. Keep in mind that this is a theoretical use case, and actual costs may vary slightly.

Here are the three cloud giants

Amazon Redshift

Amazon Redshift seamlessly works with numerous data loading, ETL tools, and BI reporting, data mining, and analytics tools. Built on PostgreSQL, it ensures compatibility with most SQL client applications with minimal modifications

Google BigQuery

Google BigQuery integrates effortlessly with various data loading, ETL tools, and business intelligence (BI) reporting, data mining, and analytics tools. As a serverless, highly scalable data warehouse, it supports SQL-like queries, making it user-friendly for those familiar with SQL. BigQuery’s native support for machine learning and advanced analytics allows for comprehensive data analysis without requiring extensive modifications to existing client applications.

Azure Synapse Analytics

Azure Synapse Analytics, formerly known as SQL Data Warehouse, integrates with a wide range of data loading, ETL tools, and BI reporting, data mining, and analytics tools. Leveraging the power of Microsoft’s SQL Server, it offers compatibility with many SQL client applications with minimal changes needed. Synapse Analytics provides a unified platform for data warehousing and big data analytics, enabling seamless data integration, exploration, and management.

Use Case #1

Hutchinson and Co. is a retail company with 100 employees with a 10 Million gross profit. For their data analytics needs, they have 10 employees who will use the data warehouse several times per week, and 1 database administrator responsible for maintaining the ETL pipeline and ensuring data validation. Their primary focus is on BI reports.

Here is a crude representation of Chat GPT analysis of estimating monthly transactions to get a feel of how many records will be produced

SQL
# Steps to Estimate Monthly Transactions

# Step 1: Calculate Monthly Gross Profit
annual_gross_profit = 10,000,000
monthly_gross_profit = annual_gross_profit / 12

# Step 2: Calculate Monthly Transactions

average_product_price = 100
monthly_transactions = monthly_gross_profit / average_product_price

# Conclusion
# Monthly Gross Profit: $833,333.33/
# Monthly Transactions: 8,333

# Yearly Transactions = 10,000,000 / 100

Data Estimation

SQL
Step-by-Step Estimation

Transaction Record Size:

As previously calculated:
162 bytes per transaction.
Total storage for transactions per year:
100,000 × 162 = 16,200,000 bytes = 15.45 MB

Product Data:
Let's assume the company has 10,000 unique products. Each product record might include:
- Product ID: 10 bytes
- Product Name: 50 bytes
- Description: 200 bytes
- Price: 8 bytes
- Category: 20 bytes
- Image: Assume each image is 200 KB

Total size per product record:
10 + 50 + 200 + 8 + 20 + 200 × 1024 = 205,000 bytes
Total storage for product data:
10,000 × 205,000 = 2,050,000,000 bytes ≈ 1,955 MB ≈ 1.91 GB

Customer Data:
Assume the company has 50,000 customers. Each customer record might include:
- Customer ID: 10 bytes
- Name: 50 bytes
- Email: 50 bytes
- Address: 100 bytes
- Purchase History: Assume 1 KB per customer

Total size per customer record:
10 + 50 + 50 + 100 + 1024 = 1,234 bytes
Total storage for customer data:
50,000 × 1,234 = 61,700,000 bytes ≈ 58.86 MB

Multimedia Data:
Assuming the company stores product images and promotional videos. Let's estimate:
- Product images: Already included in product data.
- Videos: Assume 100 promotional videos, each 50 MB.

Total storage for videos:
100 × 50 MB = 5,000 MB = 5 GB

Annual Data Storage Requirement:
Transaction data: 15.45 MB/year
Product data: 1.91 GB
Customer data: 58.86 MB
Multimedia data: 5 GB

Total Storage for 1 Year:
15.45 MB + 1.91 GB + 58.86 MB + 5 GB
= 15.45 MB + 1,955 MB + 58.86 MB + 5,000 MB
= 7,029.31 MB ≈ 6.86 GB

Hypothetical on premise costing – Use case #1

ItemDescriptionCost (USD)
Server Hardware
ProcessorIntel Xeon E3-1220 v6 (3.0GHz, 4 cores)250
RAM32GB DDR4150
Storage (Primary)2 x 500GB SSD (RAID 1)300
Storage (Backup)2 x 1TB HDD (RAID 1)100
MotherboardCompatible server motherboard200
Power Supply Unit (PSU)500W100
ChassisStandard 1U/2U rack-mount chassis150
Subtotal for Server Hardware1,250
Network Infrastructure
Ethernet SwitchGigabit Ethernet Switch (24-port)200
Cables and Patch PanelsEthernet Cables and Patch Panels100
Subtotal for Network300
Software and Licensing
PostgreSQLOpen-sourceFree
Operating SystemWindows Server500
Backup SoftwareVeeam400
Subtotal for Software900
Backup Solutions
External Backup DrivesFor offsite backup (2 x $100)200
Subtotal for Backup Solutions200
Maintenance and Support
Total Initial Setup Cost2,650

For good measure we have increased the estimation cost 5,000

Total Cost Calculation Over 20 Years

Initial Setup Cost: $5,000 (one-time)
Annual Maintenance Cost for 20 years:
750 USD/year × 20 years = 15,000 USD

Total Cost Over 20 Years

Total Cost = Initial Setup Cost + Annual Maintenance Cost for 20 years
Total Cost = 5,000 USD + 15,000 USD
Total Cost = 20,000 USD

Estimating Cloud Machine – Use Case #1

Cloud Machine Estimate – here we can see a cloud machine 4 vCPU’s and 26GB of memory priced at 270.46 but lets bump that up to 350 for our estimation

On-Premise VS Cloud- Use Case #1

SQL
After 1 Year:

On-Premise: 
5,000 + 750(Maintenance Costs) × 1 = 5,750 USD

Cloud: 
350 × 12 = 4,200 USD

On-premise is more expensive after 1 year.

After 2 Years:

On-Premise: 
5,000 + 750 × 2 = 6,500 USD

Cloud: 
350 × 24 = 8,400 USD

On-premise is cheaper after 2 years.

After 5 Years:

On-Premise: 
5,000 + 750 × 5 = 8,750 USD

Cloud: 
350 × 60 = 21,000 USD

On-premise is significantly cheaper after 5 years.

Hypothetical on premise costing – Use case #2

ItemDescriptionInitial Cost (USD)Annual Cost (USD)
Server Hardware10 high-performance servers323,00030,000
Storage SolutionsSAN/NAS systems and additional disks150,000
Network InfrastructureSwitches, cables, security devices45,000
Software and LicensingOS, database support, backup software45,00030,000
Backup SolutionsOffsite backup and backup drives70,000
Maintenance and SupportHardware maintenance and IT staff180,000
Data Center CostsRack space rental, power, and cooling68,000
Power and CoolingPower consumption and cooling systems25,000
Total Initial Setup Cost633,000
Total Annual Maintenance Cost333,000

Summary

  • Total Initial Setup Cost: $633,000
  • Total Annual Maintenance Cost: $333,000

Cost Over 20 Years

To calculate the total cost over 20 years:

Initial Setup Cost: $633,000 (one-time)

Total Annual Maintenance Cost for 20 years:
333,000 × 20 = 6,660,000 USD

Total Cost Over 20 Years:
633,000 + 6,660,000 = 7,293,000 USD

The total cost for setting up and maintaining an on-premise solution with 100 TB of storage for a company with 1,000 employees over 20 years is approximately $7,293,000.

Total Cost Estimate for Google BigQuery

Cost ComponentDescriptionAnnual Cost (USD)
Storage Costs100 TB of active storage24,000
Query Costs10 TB of data processed per month600
Data Ingestion Costs1 TB of data ingested per month614.4
Backup Storage Costs20 TB of backup storage4,800
Total Annual Cost30,014.4

Cost Over 20 Years

Total Annual Cost: 30,014.4 USD

Total Cost Over 20 Years:
30,014.4 USD/year × 20 years = 600,288 USD

Summary

  • Total Annual Cost for Google BigQuery: $30,014.4
  • Total Cost Over 20 Years for Google BigQuery: $600,288

Compared to the on-premise solution, which costs approximately $7,293,000 over 20 years, the Google BigQuery solution is significantly more cost-effective.

Conclusion

As seen in the examples, Use Case #1 demonstrates significant savings with an on-premise model, while Use Case #2 shows substantial cost savings with a cloud architecture.

On-premise can offer savings over time by avoiding continuous cloud subscription fees. However, the initial investment is high, and it requires ongoing maintenance.

Cloud solutions are attractive for SMBs with limited upfront capital and the need for flexibility. The pay-as-you-go model and reduced IT overhead make cloud services ideal for businesses expecting growth or fluctuating workloads.

Decision Factors:

  • Workload Stability: On-premise for stable needs; cloud for variable demands.
  • Budget: Consider upfront (on-premise) vs. ongoing costs (cloud).
  • Growth Rate: On-premise may be more cost-effective for low growth; cloud benefits high growth and variability.
  • IT Resources: Cloud reduces management complexity and IT staffing needs.

Choose based on your company’s growth expectations, financial capacity, and resource availability.

Leave a Reply

Your email address will not be published. Required fields are marked *