Pricing is a significant challenge in data warehousing. When choosing an architecture for the long term, it’s crucial to be aware of ongoing costs. Many individuals in the industry quickly adopt the latest technologies to alleviate technical frustrations, but this can lead to significant hidden monthly expenses. Our goal should be to preserve capital and spend only what is necessary. While there is no one-size-fits-all solution, aiming for more efficient outcomes is essential.
In today’s post, I am going to provide 2 use cases – one for a private business and the other for an enterprise.
The purpose of today’s post is not to provide an in-depth analysis of how each cloud provider operates but rather to give an overview of the costing. Keep in mind that this is a theoretical use case, and actual costs may vary slightly.
Here are the three cloud giants
Amazon Redshift
Amazon Redshift seamlessly works with numerous data loading, ETL tools, and BI reporting, data mining, and analytics tools. Built on PostgreSQL, it ensures compatibility with most SQL client applications with minimal modifications
Google BigQuery
Google BigQuery integrates effortlessly with various data loading, ETL tools, and business intelligence (BI) reporting, data mining, and analytics tools. As a serverless, highly scalable data warehouse, it supports SQL-like queries, making it user-friendly for those familiar with SQL. BigQuery’s native support for machine learning and advanced analytics allows for comprehensive data analysis without requiring extensive modifications to existing client applications.
Azure Synapse Analytics
Azure Synapse Analytics, formerly known as SQL Data Warehouse, integrates with a wide range of data loading, ETL tools, and BI reporting, data mining, and analytics tools. Leveraging the power of Microsoft’s SQL Server, it offers compatibility with many SQL client applications with minimal changes needed. Synapse Analytics provides a unified platform for data warehousing and big data analytics, enabling seamless data integration, exploration, and management.
Use Case #1
Hutchinson and Co. is a retail company with 100 employees with a 10 Million gross profit. For their data analytics needs, they have 10 employees who will use the data warehouse several times per week, and 1 database administrator responsible for maintaining the ETL pipeline and ensuring data validation. Their primary focus is on BI reports.
Here is a crude representation of Chat GPT analysis of estimating monthly transactions to get a feel of how many records will be produced
# Steps to Estimate Monthly Transactions
# Step 1: Calculate Monthly Gross Profit
annual_gross_profit = 10,000,000
monthly_gross_profit = annual_gross_profit / 12
# Step 2: Calculate Monthly Transactions
average_product_price = 100
monthly_transactions = monthly_gross_profit / average_product_price
# Conclusion
# Monthly Gross Profit: $833,333.33/
# Monthly Transactions: 8,333
# Yearly Transactions = 10,000,000 / 100
Data Estimation
Step-by-Step Estimation
Transaction Record Size:
As previously calculated:
162 bytes per transaction.
Total storage for transactions per year:
100,000 × 162 = 16,200,000 bytes = 15.45 MB
Product Data:
Let's assume the company has 10,000 unique products. Each product record might include:
- Product ID: 10 bytes
- Product Name: 50 bytes
- Description: 200 bytes
- Price: 8 bytes
- Category: 20 bytes
- Image: Assume each image is 200 KB
Total size per product record:
10 + 50 + 200 + 8 + 20 + 200 × 1024 = 205,000 bytes
Total storage for product data:
10,000 × 205,000 = 2,050,000,000 bytes ≈ 1,955 MB ≈ 1.91 GB
Customer Data:
Assume the company has 50,000 customers. Each customer record might include:
- Customer ID: 10 bytes
- Name: 50 bytes
- Email: 50 bytes
- Address: 100 bytes
- Purchase History: Assume 1 KB per customer
Total size per customer record:
10 + 50 + 50 + 100 + 1024 = 1,234 bytes
Total storage for customer data:
50,000 × 1,234 = 61,700,000 bytes ≈ 58.86 MB
Multimedia Data:
Assuming the company stores product images and promotional videos. Let's estimate:
- Product images: Already included in product data.
- Videos: Assume 100 promotional videos, each 50 MB.
Total storage for videos:
100 × 50 MB = 5,000 MB = 5 GB
Annual Data Storage Requirement:
Transaction data: 15.45 MB/year
Product data: 1.91 GB
Customer data: 58.86 MB
Multimedia data: 5 GB
Total Storage for 1 Year:
15.45 MB + 1.91 GB + 58.86 MB + 5 GB
= 15.45 MB + 1,955 MB + 58.86 MB + 5,000 MB
= 7,029.31 MB ≈ 6.86 GB
Hypothetical on premise costing – Use case #1
Item | Description | Cost (USD) |
---|---|---|
Server Hardware | ||
Processor | Intel Xeon E3-1220 v6 (3.0GHz, 4 cores) | 250 |
RAM | 32GB DDR4 | 150 |
Storage (Primary) | 2 x 500GB SSD (RAID 1) | 300 |
Storage (Backup) | 2 x 1TB HDD (RAID 1) | 100 |
Motherboard | Compatible server motherboard | 200 |
Power Supply Unit (PSU) | 500W | 100 |
Chassis | Standard 1U/2U rack-mount chassis | 150 |
Subtotal for Server Hardware | 1,250 | |
Network Infrastructure | ||
Ethernet Switch | Gigabit Ethernet Switch (24-port) | 200 |
Cables and Patch Panels | Ethernet Cables and Patch Panels | 100 |
Subtotal for Network | 300 | |
Software and Licensing | ||
PostgreSQL | Open-source | Free |
Operating System | Windows Server | 500 |
Backup Software | Veeam | 400 |
Subtotal for Software | 900 | |
Backup Solutions | ||
External Backup Drives | For offsite backup (2 x $100) | 200 |
Subtotal for Backup Solutions | 200 | |
Maintenance and Support | ||
Total Initial Setup Cost | 2,650 |
For good measure we have increased the estimation cost 5,000
Total Cost Calculation Over 20 Years
Initial Setup Cost: $5,000 (one-time)
Annual Maintenance Cost for 20 years:
750 USD/year × 20 years = 15,000 USD
Total Cost Over 20 Years
Total Cost = Initial Setup Cost + Annual Maintenance Cost for 20 years
Total Cost = 5,000 USD + 15,000 USD
Total Cost = 20,000 USD
Estimating Cloud Machine – Use Case #1
Cloud Machine Estimate – here we can see a cloud machine 4 vCPU’s and 26GB of memory priced at 270.46 but lets bump that up to 350 for our estimation
On-Premise VS Cloud- Use Case #1
After 1 Year:
On-Premise:
5,000 + 750(Maintenance Costs) × 1 = 5,750 USD
Cloud:
350 × 12 = 4,200 USD
On-premise is more expensive after 1 year.
After 2 Years:
On-Premise:
5,000 + 750 × 2 = 6,500 USD
Cloud:
350 × 24 = 8,400 USD
On-premise is cheaper after 2 years.
After 5 Years:
On-Premise:
5,000 + 750 × 5 = 8,750 USD
Cloud:
350 × 60 = 21,000 USD
On-premise is significantly cheaper after 5 years.
Hypothetical on premise costing – Use case #2
Item | Description | Initial Cost (USD) | Annual Cost (USD) |
---|---|---|---|
Server Hardware | 10 high-performance servers | 323,000 | 30,000 |
Storage Solutions | SAN/NAS systems and additional disks | 150,000 | – |
Network Infrastructure | Switches, cables, security devices | 45,000 | – |
Software and Licensing | OS, database support, backup software | 45,000 | 30,000 |
Backup Solutions | Offsite backup and backup drives | 70,000 | – |
Maintenance and Support | Hardware maintenance and IT staff | – | 180,000 |
Data Center Costs | Rack space rental, power, and cooling | – | 68,000 |
Power and Cooling | Power consumption and cooling systems | – | 25,000 |
Total Initial Setup Cost | 633,000 | ||
Total Annual Maintenance Cost | 333,000 |
Summary
- Total Initial Setup Cost: $633,000
- Total Annual Maintenance Cost: $333,000
Cost Over 20 Years
To calculate the total cost over 20 years:
Initial Setup Cost: $633,000 (one-time)
Total Annual Maintenance Cost for 20 years:
333,000 × 20 = 6,660,000 USD
Total Cost Over 20 Years:
633,000 + 6,660,000 = 7,293,000 USD
The total cost for setting up and maintaining an on-premise solution with 100 TB of storage for a company with 1,000 employees over 20 years is approximately $7,293,000.
Total Cost Estimate for Google BigQuery
Cost Component | Description | Annual Cost (USD) |
---|---|---|
Storage Costs | 100 TB of active storage | 24,000 |
Query Costs | 10 TB of data processed per month | 600 |
Data Ingestion Costs | 1 TB of data ingested per month | 614.4 |
Backup Storage Costs | 20 TB of backup storage | 4,800 |
Total Annual Cost | 30,014.4 |
Cost Over 20 Years
Total Annual Cost: 30,014.4 USD
Total Cost Over 20 Years:
30,014.4 USD/year × 20 years = 600,288 USD
Summary
- Total Annual Cost for Google BigQuery: $30,014.4
- Total Cost Over 20 Years for Google BigQuery: $600,288
Compared to the on-premise solution, which costs approximately $7,293,000 over 20 years, the Google BigQuery solution is significantly more cost-effective.
Conclusion
As seen in the examples, Use Case #1 demonstrates significant savings with an on-premise model, while Use Case #2 shows substantial cost savings with a cloud architecture.
On-premise can offer savings over time by avoiding continuous cloud subscription fees. However, the initial investment is high, and it requires ongoing maintenance.
Cloud solutions are attractive for SMBs with limited upfront capital and the need for flexibility. The pay-as-you-go model and reduced IT overhead make cloud services ideal for businesses expecting growth or fluctuating workloads.
Decision Factors:
- Workload Stability: On-premise for stable needs; cloud for variable demands.
- Budget: Consider upfront (on-premise) vs. ongoing costs (cloud).
- Growth Rate: On-premise may be more cost-effective for low growth; cloud benefits high growth and variability.
- IT Resources: Cloud reduces management complexity and IT staffing needs.
Choose based on your company’s growth expectations, financial capacity, and resource availability.
Leave a Reply