Skip to content

Data Ingestion Strategy for Multi-Terabyte Photo Transfer from NZ to AWS Sydney Region

Enterprise-Grade Design for Bulk Photo Ingestion into AWS Sydney (ap-southeast-2)

(leveraging the existing AWS Direct Connect between Auckland & Sydney)


0 Assumptions & Project Objectives

Item Value / Note
Data set “Several terabytes” (model 10 TB and 50 TB scenarios).
Source sites Supplier storage in Auckland & a smaller depot in Australia.
Connectivity 1 Gbps Direct Connect (DX) public VIF already provisioned between the NZ datacentre and AWS Sydney. Port cost: $0.30 h for 1 Gbps ports (ex-Japan) (Amazon Web Services, Inc.).
Landing zone S3 general-purpose bucket with Versioning + SSE-KMS; optional replication to ap-southeast-4 (Melbourne) for DR.
Integrity Regulatory requirement for cryptographic proof that every object arrived un-corrupted.

1 Step-by-Step Transfer Strategy — DX ▸ DataSync (primary)

Phase Detailed Tasks Rationale
1 Pre-flight audit Supplier generates a SHA-256 manifest (sha256sum -b **/* > manifest.txt), store immutable copy in S3 Glacier Deep Archive. Immutable baseline for later reconciliation.
2 Deploy agent Launch AWS DataSync agent as a VMware OVA or Docker on supplier’s ESXi/KVM host. Outbound ports 443 & 1024-1064 only. Agent pulls over DX public VIF; no inbound firewall holes.
3 Create DataSync task Source = NFS/SMB location, Destination = S3 bucket; select Enhanced mode; set bandwidth throttle to 850 Mbps. Enhanced mode parallelises LIST/PUT and checksum streams.
4 Pilot transfer (10 GB) Verify end-to-end checksum report, CloudWatch throughput ≈ 800 Mbps. Confirms BGP path over DX and LAN read speed.
5 Bulk run Execute task, enable “Keep deleted files” off, “Verify mode” on. Every object is SHA-1 + MD5 checked by DataSync while in flight.
6 Post-transfer delta If supplier will add late files, schedule daily incremental DataSync job. Bills only for new GB; avoids Snowball reship.
7 Reconciliation Download DataSync task report + run aws s3api list-objects-v2 --checksum and diff against manifest. Provides formal acceptance artefact for compliance.
8 Lifecycle & DR Create lifecycle rule: > 90 days ➜ S3 Glacier Instant Retrieval; enable CRR to Melbourne. Controls storage OPEX & meets BC policy.

2 Network Performance Deep-Dive

2.1 Bandwidth & Throughput Calculation

Round-Trip Time (RTT) NZ ⇄ Sydney ≈ 22 ms over Southern Cross cable (11 ms one-way) (Amazon Web Services, Inc.).

BDP = Bandwidth × RTT For 1 Gbps link → BDP ≈ (1 000 Mbps × 0.022 s) = 22 Mbit ≈ 2.75 MB.

TCP window scaling easily handles this; DataSync opens 16 – 32 parallel streams so you should see ~850 Mbps sustained after protocol overhead.

2.2 DX Path Optimisations

  1. Public VIF to S3 – ensure the S3 prefixes (52.95.128.0/17 etc.) are advertised.
  2. Jumbo frames (MTU = 8500) on the provider’s L2 if available.
  3. BFD and Bidirectional Forwarding Detection timers → 300 ms for faster failover to Internet if DX drops.
  4. MACsec at the telco CPE (optional) to satisfy NZ Privacy Act encryption in transit.

2.3 What About Internet Uploads?

If the supplier’s AU node also needs to push data, give it an s3-accelerate endpoint; latency AU ➜ AU edge ≈ 5 ms, but note the $0.08 GB Transfer-Acceleration surcharge (Amazon Web Services, Inc.).


3 Data Consistency & Integrity Controls

Layer Mechanism Notes
Transport TLS 1.2+, DX encryption (MACsec) Prevents tampering in transit.
Application DataSync per-object checksum + automatic retry Built-in; no extra scripting (Amazon Web Services, Inc.).
Storage S3 Versioning + Object Lock (Compliance) Immutable retention; defends against accidental overwrite.
Audit CloudTrail, DataSync task logs, manifest diff Traceability for every object PUT.
DR Cross-Region Replication to Melbourne Protects against region-wide disaster.

4. Executive Technical Analysis - Cost Model (USD, ex-GST)

Option One-time 10 TB One-time 50 TB Throughput / Time Remarks
[+] DX + DataSync (primary) DataSync fee $0.0125/GB$128; DX data-in $0; port already paid $640 10 TB ≈ 28 h @ 800 Mbps Predictable; all-software
[+] Snowball Edge (80 TB device) $1,800 flat job + shipping (Amazon Web Services, Inc.) same LAN-bound; 1.5 GB s device ingest → 10 TB ≈ 1 h but + shipping (3–5 days) Use if WAN < 200 Mbps or > 80 TB.
[+] WorkSpaces jump host $31/mo + $0.28/h; data-in free same Still uses same DX/Internet; no speed boost Adds VDI complexity; N/A for bulk transfer.
[-] S3 Transfer Acceleration S3-TA fee $0.08/GB$819 $4,096 Minor speed gain at 22 ms Economics unfavourable.
[-] DX + CLI/SDK multipart $0 (no DataSync) $0 Same speed Must script checksums + retries manually

Observation: the DX port charge is already sunk cost for the enterprise; incremental spend is essentially the $0.0125 GB DataSync fee (or $0 if you script your own multipart uploads).


5 Professional Recommendation

Criterion DX + DataSync DX + CLI Snowball S3 TA
Cost efficiency ★★★★☆ ★★★★★ ★★☆☆☆ ★☆☆☆☆
Integrity automation ★★★★★ ★★☆☆☆ ★★★★★ ★★☆☆☆
Operational effort ★★★★☆ ★★☆☆☆ ★★★☆☆ ★★★☆☆
Speed (given 1 Gbps DX) ★★★★☆ ★★★★☆ ★★★★★ (device) ★★★★☆

Deploy AWS DataSync over the existing Direct Connect path as the production method. It combines wire-speed transfer, cryptographic verification, and the lowest incremental cost while keeping the supplier entirely off the public Internet.

Fallback – If DX utilisation spikes or bandwidth tests fall below 200 Mbps, issue a Snowball Edge Storage-Optimized job.

Not recommended – Transfer Acceleration (cost premium) or WorkSpaces (adds no throughput).


6 Mermaid Implementation Diagram

flowchart LR
    subgraph Supplier NZ DC
        NAS[On-prem Photo NAS / SAN]
        DSAgent[DataSync Agent<br>on-prem VM or Docker]
    end
    subgraph Direct Connect
        DXpath[[1 Gbps Public VIF<br>RTT ≈ 22 ms]]
    end
    subgraph AWS Sydney
        S3[(S3 Landing Bucket<br>SSE-KMS, Versioning)]
        Inv[Manifest & Task Reports<br>in S3/Glacier]
        CRR[Melbourne/AUS Copy: optional]
    end
    NAS --> DSAgent
    DSAgent --> DXpath --> S3
    S3 --> Inv
    S3 -.replicate.-> CRR

    %% Contingency
    NAS -. "If WAN slow > 40 TB" .-> SB[Snowball Edge 80-210 TB]
    SB -. Ship / Import .-> S3

flowchart LR
    subgraph Supplier NZ
        A[On-prem NAS / Photo Store]
        DSAgent[DataSync Agent: VM or Docker]
    end
    subgraph Internet: 22 ms RTT
        B((TLS + Parallel TCP))
    end
    subgraph AWS Sydney
        S3[(S3 Landing Bucket<br>Versioning + SSE-KMS)]
        Val[Data Integrity Report: manifest compare]
    end
    A -->|Parallel verified stream| DSAgent
    DSAgent --> B --> S3
    S3 --> Val
    %% Optional paths
    A -.->|>40 TB alt path| SB[Snowball Edge Device]
    SB -.->|Courier| S3
    A -.->|Future recurring flows| DX[Direct Connect \1Gbps\]
    DX -.-> S3

7 Next Technical & Procurement Actions

  1. Bandwidth Test – run iperf3 -c speedtest.data.aws across DX to validate 800 Mbps+.
  2. IAM & KMS – create least-privilege role DataSync-IngestRole bound to bucket key alias /photos/landing.
  3. Cost Tagging – apply aws:createdBy=SupplierUpload to bucket & DX usage for show-back.
  4. Compliance Artefacts – archive CloudTrail, DX flow logs, DataSync reports for 7 years.
  5. Supplier SOP – supply a four-page run-book (“unlock agent, start task, monitor CloudWatch”) with screen-shots.
  6. Procurement Note – No new hardware CAPEX; incremental OPEX ≈ $0.0125 GB only—approve under existing cloud spend envelope.

Delivering via DX-backed DataSync satisfies the AWS Well-Architected Security, Reliability, and Cost-Optimization pillars while giving business stakeholders a clear, auditable chain-of-custody and the fastest time-to-value.


=== WIP ==>

Executive Technical Analysis – Moving “Several Terabytes” of Photos to AWS Sydney (ap-southeast-2)

Option One-time data-transfer cost (10 TB example) Throughput / Time-to-complete¹ Operational touch-points Data-integrity guarantees Typical break-even size
AWS DataSync (agent VM at supplier, Internet) $0.0125 / GB ⇒ ≈ $128 for 10 TB (Amazon Web Services, Inc.) Saturates 1 Gbps link; 10 TB ≈ 28 h @ 0.8 Gbps Purely software; no logistics End-to-end checksum on every file; automatic retries, IAM & TLS 0 – ≈ 40 TB
Snowball Edge Storage-Optimized Flat job fee $1,800 (≤ 100 TB) + shipping (Amazon Web Services, Inc.) Limited by on-prem LAN; device holds 80 TB usable Order → ship → load → ship → ingest AES-256, TPM key, chain-of-custody barcode scans ≥ 40 TB or where WAN is slow/unreliable
S3 Transfer Acceleration $0.08 / GB (NZ → AU edge class) ⇒ ≈ $819 for 10 TB (Amazon Web Services, Inc.) Uses CloudFront PoPs; gains minimal on 22 ms RTT No device, but expensive Checksums only if client does multipart with --checksum Rarely cost-effective within Tasman
Direct Connect (1 Gbps hosted port, 30 days) Port $0.30/h ⇒ ≈ $216 + cross-connect + carrier; data-in free (Amazon Web Services, Inc.) Wire-speed; reusable for future flows Telco lead-time, LOA/CFA Private, deterministic path; your own QoS Multi-year, multi-PB use cases
WorkSpaces (Standard bundle as jump host) Desktop $31 mo + $0.28 h (Amazon Web Services, Inc.); data-in free Same Internet path; no speed gain Adds VDI layer & user friction Relies on whatever client tool you run Not a transfer solution—only an admin console

¹ Transfer time ≈ (TB × 8 × 1024) / (effective Mbps) seconds. E.g., 10 TB at 0.8 Gbps (typical 1 Gbps link @ 80 % utilisation) ≈ 28 hours.


1 Data Consistency & Integrity

Best Practice Rationale / Implementation
Pre-flight manifest & hashes Supplier generates SHA-256 manifest; store in S3 Glacier Deep Archive for audit.
AWS DataSync verification Computes checksums at source and destination—blocks corrupt files automatically (Amazon Web Services, Inc.).
Multipart upload with checksums If using CLI: aws s3 cp --recursive --checksum-algorithm sha256 --expected-size … ensures reject-on-mismatch.
S3 safeguards Enable Versioning, Object Lock (Compliance mode), and Default Encryption (SSE-KMS) in landing bucket.
Post-transfer inventory Use S3 Inventory or aws s3api list-objects + hash compare to manifest for final reconciliation.
Cross-Region backup (optional) Replicate to ap-southeast-4 (Melbourne) for DR once landed.

Snowball offers its own inline hashing during ingest; logs are available in the Snowball console for proof of custody. In all cases, keep CloudTrail turned on for evidentiary logging.


2 Network Latency & Bandwidth Considerations

  • Physical RTT NZ ↔ Sydney ≈ 22 ms over Southern Cross cable (Amazon Web Services, Inc.).

  • This is low-latency; TCP window scaling easily reaches ~500–800 Mbps on modern broadband or 10 Gb DIA links without special acceleration.

  • DataSync opens parallel TCP streams and pipeline reads/writes, typically hitting 80–90 % of line-rate without tuning.
  • S3 Transfer Acceleration shines on >100 ms latencies; here it adds cost but little speedup—AWS even auto-bypasses (and waives fee) when gain < 1 %.
  • Direct Connect removes Internet variability and enforces deterministic throughput, but 1-month port commits rarely justify a one-off job.

If supplier bandwidth is the bottleneck (e.g., 100 Mbps link), Snowball becomes the faster path (truck > cables rule).


3 Cost Optimisation Summary

  1. 0–40 TB & usable pipeDataSync is the lowest cash outlay (≈ $0.0125/GB) and zero logistics.
  2. 40–100 TB or slow/unstable WANSnowball Edge is cheaper (flat $1.8 k) and faster end-to-end than weeks of trickle uploads.
  3. Multi-year recurring transfers, > 100 TB/monthDirect Connect with DataSync riding the private link yields long-term savings.
  4. WorkSpaces adds no economic or performance benefit for bulk transfer; reserve it for ad-hoc desktop administration only.
  5. Disable S3 Transfer Acceleration unless empirical test shows > 20 % speedup; otherwise it increases bill 6-fold relative to DataSync.

Professional Recommendation

Use AWS DataSync with an on-prem VM agent in New Zealand for this one-time “several-TB” load. It is the most cost-effective, operationally lean, and cryptographically verifiable approach given the sub-30 ms Tasman latency and enterprise security requirements.

Fallback – If supplier bandwidth tests show sustained throughput < 200 Mbps or the data set grows beyond ~40 TB, pivot to a Snowball Edge Storage-Optimized job.

Not recommended – WorkSpaces (adds VDI cost without throughput gain) and S3 Transfer Acceleration (cost premium, negligible latency gain here).

Trade-offs at a glance

  • DataSync – Pros: pay-as-you-go, automated integrity checks, no shipping risk. Cons: reliant on WAN capacity, per-GB charge.
  • Snowball – Pros: predictable flat cost, offline bulk load, encrypted at rest. Cons: physical handling, 3–5 days shipping, 210 TB device limit, no incremental sync.
  • Direct Connect – Pros: dedicated, reusable. Cons: provisioning lead-time, commit contracts, capex-like cost model.
  • WorkSpaces – Pros: familiar desktop UI. Cons: duplicates transfer path, slower (adds blast radius), still pays DataSync/PUT fees.

Key Next Steps for the Project Team

  1. Kick-off bandwidth test – iperf3 from supplier site to speedtest.data.aws endpoint in Sydney.
  2. Deploy DataSync agent (OVA/AMI or Docker) on supplier’s VM host; open ports 443 & 1024-1064 outbound only.
  3. Create least-privilege IAM role with DataSync, S3:PutObject*, KMS:Encrypt privileges bound to landing bucket KMS key.
  4. Execute trial run (10 GB); review DataSync task reports for throughput & checksum stats.
  5. Scale to full dataset, monitor CloudWatch metrics; alert on any ≠ 0 verification errors.
  6. Post-ingest lifecycle rules – Transition originals > 90 days to S3 Glacier Instant Retrieval to control storage costs.
  7. Document chain-of-custody in Confluence for compliance audit.

Delivering this way aligns with AWS Well-Architected Operational Excellence, Security, and Cost Optimisation pillars while satisfying procurement’s need for transparent, predictable spend.


Certainly. Below is a rigorous enterprise-grade cloud architecture analysis and recommendation tailored for the given photo ingestion scenario, with special focus on data integrity, performance, and cost optimization, presented in a tone suitable for both architecture and business stakeholders.


Scenario Summary

The enterprise must ingest several terabytes (TBs) of high-value photo assets into AWS Sydney (ap-southeast-2) from a supplier with resources in both Australia and New Zealand. The original plan was to use AWS Snowball, but due to operational preference, the supplier now wishes to upload from New Zealand over the internet into AWS Sydney.


1. Data Consistency & Integrity

Transferring terabytes of data over the internet introduces real-world risks such as packet loss, incomplete uploads, and human/operator errors. Thus, data integrity must be programmatically enforced at each stage:

Best Practices for Ensuring Consistency

  • Use Amazon S3 Multipart Uploads: Enables parallel uploading of parts (recommended for files >100MB), with each part checksum-verified before commit.
  • Enable S3 Bucket Versioning + Object Lock: Guarantees immutability and auditability of uploaded files during transfer window.
  • SHA-256 Hash Validation: Supplier should generate checksums pre-upload; use AWS SDK/AWS CLI to validate after upload (etag or manual hash compare).
  • Transfer Resume Support: Use AWS Transfer Acceleration + AWS CLI or S3 API with retry/backoff logic (e.g., aws s3 cp with --exact-timestamps, --only-show-errors, and --checksum-algorithm sha256).
  • Staging Directory / Manifest Logs: Maintain a log of transferred files, status, and any retry attempts. Store separately in DynamoDB or S3 for validation and reporting.

2. Network Latency & Bandwidth (NZ to AU over Internet)

While NZ–AU is relatively close geographically, cross-border internet traffic is subject to ISP peering, potential bottlenecks, and cost-per-GB metering, especially when sustained over terabytes.

Performance Considerations

Factor Details
Latency ~20–30ms typical NZ to Sydney over consumer broadband/fiber
Bandwidth Stability May fluctuate during daytime. Dedicated fiber = optimal
Estimated Upload Time For 5TB @ 100Mbps = ~5 days nonstop (theoretical)

Optimization Tools

  • Amazon S3 Transfer Acceleration (TA):

  • Uses AWS edge locations in NZ (e.g., Auckland PoPs) to route uploads via optimized AWS backbone to S3.

  • 50–500% faster for cross-border uploads (AWS internal benchmarks).
  • AWS DataSync (over Internet):

  • Install agent on NZ resource; uploads to AWS S3 using optimized TCP, auto-compression, retry logic.

  • Full E2E visibility and integrity checks built-in.
  • Dedicated Upload Workload in AU (via WorkSpaces or EC2):

  • Deploy an Amazon WorkSpaces / EC2 instance in Sydney with EBS storage.

  • NZ user RDPs into the Sydney-hosted VM and uploads over AU-to-AU path (inside AWS backbone).
  • Reduces NZ-AU latency but still depends on stable NZ→AU RDP performance.

3. Cost Optimization

Let’s evaluate Snowball, WorkSpaces, S3 Transfer Acceleration, DataSync, and Direct Connect for cost and complexity:

Cost Summary Table

Option Upfront Setup Performance NZ to AU Internet Dependency Estimated Cost (5TB) Notes
AWS Snowball Edge Medium High None ~$500–$800 total Secure, fully offline, no bandwidth needed
S3 Transfer Acceleration Low Medium–High Yes $0.04/GB * 5TB = ~$200 Fastest over-the-wire; easy CLI
AWS WorkSpaces + RDP Upload Medium Medium Yes $35–$50/month + EBS AU-internal path, but depends on RDP speed
AWS DataSync Medium High Yes $0.0125/GB * 5TB = $62.50 + EC2 Enterprise-grade with scheduling
Direct Connect High Very High No Long-term contract Not viable for one-time transfer

Final Recommendation (Enterprise-Grade)

Primary Option: AWS DataSync with NZ Agent → S3 (Sydney)

Why?

  • Enterprise-grade orchestration: Schedule, monitor, and automate uploads.
  • Data integrity: Built-in verification and retry logic.
  • Cost-effective: Significantly cheaper than Snowball and WorkSpaces.
  • Security: All data encrypted in transit via TLS; IAM role-scoped.

Fallback Option: AWS Snowball Edge (from Australia only)

If DataSync is not viable from NZ due to bandwidth or reliability, revert to Snowball via the Australian resource. This gives the highest throughput and avoids network bottlenecks altogether.


Architecture Overview:

flowchart TB
  subgraph NZ
    NZUploader[User at NZ Resource]
    DataAgent[DataSync Agent / CLI Tool]
  end

  subgraph "AWS Cloud (Sydney Region)"
    S3Bucket[(Amazon S3 Bucket)]
    MonitorCloudWatch[[CloudWatch / Logs]]
    IntegrityCheck[Lambda / Hash Validator]
  end

  subgraph Optional[Optional: WorkSpaces Upload Method]
    WorkSpace[Amazon WorkSpaces (Sydney)]
    EBSVolume[EBS Storage]
  end

  NZUploader --> DataAgent
  DataAgent -->|Upload via Internet| S3Bucket
  S3Bucket --> MonitorCloudWatch
  S3Bucket --> IntegrityCheck

  NZUploader -->|RDP| WorkSpace
  WorkSpace --> EBSVolume
  EBSVolume --> S3Bucket

Summary of Trade-offs

Solution Pros Cons
AWS DataSync Enterprise-grade, retry-safe, efficient Requires agent setup and EC2 or on-prem
S3 Transfer Acceleration Fast, easy to enable Can be expensive per GB
AWS Snowball Fully offline, fast, secure Requires Australian pickup, longer turnaround
AWS WorkSpaces (Upload from AU) AU-internal path, RDP controllable User experience and speed depend on NZ internet

Executive Conclusion

The most cost-effective, secure, and scalable approach is to deploy AWS DataSync with a local agent in NZ to upload directly into S3 (Sydney). It ensures data integrity, offers full visibility, and avoids manual transfer risks. For suppliers lacking robust internet, AWS Snowball remains the fallback. WorkSpaces-based upload can serve as an operational workaround but introduces complexity and human error potential.

Let me know if you'd like Terraform modules or hardening guidance for the DataSync deployment.

===


Enterprise-Grade Photo Ingest Strategy

Leveraging Existing AWS Direct Connect + Outposts (Auckland/NZ) to Land Data in S3, ap-southeast-2


0 Context & Assets on Hand

Element Current State
Data volume “Several TB” (assume 10–40 TB in sizing examples)
Sources Supplier hosts in Australia and a resource in New Zealand
Target Amazon S3 in Sydney Region
Enterprise network 1 Gbps (dedicated) AWS Direct Connect New Zealand ⇄ Sydney already provisioned
On-prem AWS footprint AWS Outposts servers in Auckland DC, local 10 Gb LAN
Security posture All traffic must be encrypted in flight and at rest, with audit-quality logging

These assets fundamentally change the economics: the Direct Connect port fee is a sunk cost, and the Outposts servers give us an AWS-native landing zone on-prem.


1 Data Consistency & Integrity – Multi-Layer Controls

Layer Control Rationale
File manifest SHA-256 of every object captured at supplier; signed and stored in Glacier Deep Archive Immutable audit baseline
Transfer engine AWS DataSync in Enhanced mode (agent on Outposts) – calculates checksums at source and destination, retries corrupt chunks automatically End-to-end validation without manual scripting
Landing bucket S3 bucket in Sydney with Versioning, Object Lock (Compliance), SSE-KMS Eliminates overwrite risk; meets retention regs
Post-ingest compare Glue Crawlers or aws s3api list-objects → hash diff against manifest Final reconciliation
Regional DR S3 Replication to ap-southeast-4 (Melbourne) Fulfils BC/DR RPO goals

2 Network Latency, Bandwidth & Timing (Step-by-Step)

  1. Supplier ↔ Outposts (Auckland) Local copy over 10 Gb LAN → wire-speed; negligible latency.

  2. Outposts ↔ AWS Sydney via Direct Connect 1 Gbps dedicated link, RTT ≈ 22 ms (Tasman) – no Internet congestion. Throughput model:

$$ \text{Duration (hrs)}=\frac{\text{TB} \times 8{,}192}{\text{Gbps} \times 3{,}600} $$

Example — 20 TB @ 0.9 Gbps effective ≈ 20 × 8 ,192 / (0.9×3 ,600) ≈ 50 h.

  1. Throttling protection – DataSync lets us cap BWL; prevents saturating the DX circuit used by other workloads.

Result: Even at 20 TB the job finishes in < 3 days without touching the public Internet.


3 Cost Modelling (Incremental Only)

Assume 20 TB, Enhanced‐mode DataSync, 1-month DX port already budgeted.

Path Data-transfer fee Service fee Shipping Total (20 TB)
DataSync over Direct Connect 20 TB × $0.015/GB$307 None None ≈ $307
Snowball Edge 210 TB Data-in free $1,800 service fee (Sydney table) ~$250 return freight ≈ $2,050
S3 Transfer Acceleration 20 TB × $0.08/GB$1,638 (NZ → AU rate) ≈ $1,638
WorkSpaces jump-host + CLI Data-in free Desktop $31 mo + usage Adds cost, no speed gain

Observations With DX already paid for, DataSync is ~6× cheaper than Snowball and ~5× cheaper than S3TA. No cap-ex, no logistics, no risk of physical loss.


4 Deep Option Analysis & Trade-offs

Option Pros Cons When to choose
DataSync + DX (Recommended) • Incremental cost only $0.015/GB
• Block-level checksums, resume, scheduling
• Uses private DX, zero Internet attack surface
• Dependent on DX capacity; pace = 1 Gbps • 1–50 TB bursts; ongoing nightly deltas; regulated data
Snowball Edge • Flat fee, no WAN dependency
• Encrypts (AES-256), TPM-sealed keys
• Handles petabytes
• Logistics, 5–7 day round trip
• Manual chain-of-custody
• Sites without DX or < 100 Mbps WAN; > 50 TB
S3 Transfer Acceleration • No device, quick enable
• Good for > 100 ms RTT
• 5×–8× cost premium
• Gains negligible on Tasman latency
• Global end users far from bucket
WorkSpaces jump host • Familiar desktop UX for ad-hoc admin • Still travels same path
• Adds per-hour VDI charges
• Not purpose-built for bulk ingest
• Only for GUI-heavy validation tasks

  1. Create S3 on Outposts bucket (“nz-photo-staging”) and mount via NFS/SDK inside the Auckland DC.
  2. Deploy DataSync agent as a VM on the Outposts server.
  3. Define DataSync locations
  • Source = Outposts bucket
  • Destination = S3 “au-photo-landing” bucket (Versioning+Lock) in Sydney. 4. Schedule a test task (100 GB) → validate throughput & checksum. 5. Run production task with bandwidth limit set to 850 Mbps to leave 15 % headroom on the DX. 6. Enable CloudWatch alarms on VerificationErrors > 0 and task duration. 7. Post-ingest lifecycle rules – after 90 days transition to Glacier Instant Retrieval to trim storage OpEx. 8. Cleanup – retain manifest + CloudTrail logs 7 years for compliance.

6 Implementation Diagram

flowchart TB
    subgraph Supplier_NZ["Supplier Resource (NZ)"]
        NAS[(Photo NAS)]
    end
    subgraph OnPrem_Outposts["AWS Outposts – Auckland DC"]
        Stg[S3 on Outposts<br>“nz-photo-staging”]
        Agent[DataSync Agent<br>(VM)]
    end
    subgraph DirectConnect["1 Gbps Direct Connect<br>Auckland ⇄ Sydney"]
        DX((Private 802.1Q))
    end
    subgraph AWS_Sydney["AWS Sydney (ap-southeast-2)"]
        Landing[(S3 “au-photo-landing”<br>Ver+Lock+SSE-KMS)]
        Inventory[[Glue / Athena<br>Inventory Reconcile]]
        Glacier[(Glacier IR<br>90-day policy)]
    end
    NAS -->|10 Gb LAN copy| Stg
    Stg --> Agent
    Agent -->|Verified stream| DX --> Landing
    Landing --> Inventory
    Landing -->|Lifecycle| Glacier
    %% Contingency
    NAS -.->|if WAN slow>| Snowball[Snowball Edge<br>210 TB] -.-> Landing

7 Key Talking Points for Architecture & Procurement Teams

  • Security first – Private DX path + AWS KMS, CloudTrail, Object Lock satisfy CIS & ISO 27001 controls.
  • Cost clarity – Incremental outlay ≈ NZD 500 for 20 TB vs > NZD 3 k for Snowball.
  • Time-to-data – < 72 h door-to-bucket; beats freight cycle by several days.
  • Scalability – Same pipeline supports nightly delta syncs (DataSync incremental) or future PB-scale (add 10 Gbps DX).
  • Fallback – Snowball remains pre-approved contingency should DX circuit saturate or data balloon beyond Outposts capacity.

Executive Recommendation

Adopt a Direct-Connect-optimised DataSync workflow, staging on your existing Outposts in Auckland. It delivers the lowest incremental cost, deterministic throughput, and verifiable data integrity while exploiting assets the business already pays for.

Keep a Snowball Edge Storage-Optimized request templated in the console as a contingency for datasets exceeding ~50 TB or in the event of DX outage.

Decommission the WorkSpaces-based idea for bulk transfer—it adds spend without performance upside.

This approach aligns with the AWS Well-Architected Framework (Cost Optimisation, Security, Operational Excellence) and leverages your enterprise investments to maximum effect.

===

Absolutely. Given the clarified enterprise context—AWS Direct Connect (DX) between New Zealand and Sydney and AWS Outposts deployed on-premise in Auckland/New Zealand—the solution design requires a significantly deeper architectural alignment across hybrid cloud, network path optimization, cost modeling, and data integrity guarantees.



Context Recap

  • Source Data: Several terabytes of high-resolution photos stored at a supplier facility in New Zealand.
  • Destination: Amazon S3 in ap-southeast-2 (Sydney)
  • Available AWS Infrastructure:

  • AWS Direct Connect (DX): Dedicated 1–10 Gbps private network between NZ (on-prem) and AWS Sydney.

  • AWS Outposts: On-prem hybrid cloud racks located in NZ, managed as AWS resources with access to AWS Region services.
  • Initial Proposal: Snowball Edge (physical transfer)
  • Updated Proposal: Cloud-native upload over network

Step-by-Step Solution Design

1. Data Integrity, Consistency, and Security

Robust data movement requires chain-of-custody, integrity verification, and encryption at rest and in transit.

Component Implementation
Pre-ingest Hash Manifest Supplier computes SHA-256 or MD5 manifest of all source files. Stored securely.
Transfer Protocol Use AWS DataSync (preferred) or S3 Multipart Upload w/ checksum over Direct Connect.
DataSync Verification Each file is checksummed (block-level) at source and destination. Corrupted transfers are automatically retried.
Encryption Use S3 Bucket with SSE-KMS. Data in-transit is encrypted via TLS, and DX uses a private Layer 2/3 circuit.
Audit Trail Enable AWS CloudTrail, DataSync Logs in CloudWatch, and S3 Access Logs for full traceability.
Post-transfer Verification Validate object count and hash manifest match; optionally use S3 Inventory for audit exports.

2. Network Performance: Leverage Direct Connect from NZ to Sydney

The Direct Connect circuit is the most deterministic, secure, and high-bandwidth path for this data flow.

Throughput Scenarios

DX Port Realistic Throughput Transfer Time (10 TB)
1 Gbps ~800 Mbps usable ~28–30 hours
10 Gbps ~8 Gbps usable ~3–3.5 hours

Optimizations

  • Path Segregation: Avoid congestion on shared WAN—use DX Gateway with private VIF directly connected to the VPC in Sydney.
  • Jumbo Frames: Enable 9001 MTU if supported by on-prem LAN and DX router.
  • Parallelization: DataSync performs multi-threaded uploads, saturating pipe efficiently.
  • Fault Recovery: DataSync resumes failed transfers without duplication.

3. Leverage AWS Outposts as Hybrid Edge for Staging and Buffering

With AWS Outposts available on-prem in NZ, additional options become viable:

Outposts Role in This Workflow

Use Case Value
Intermediate Buffer Use Outposts S3 or EBS volumes as staging layer; allows upstream validation before final S3 upload.
Compute Node Run DataSync Agent, or Lambda on Outposts, to perform compute operations (compression, hashing).
Security Controls Benefit from centralized IAM, KMS, and VPC security controls consistent with regional AWS standards.
Audit Zone Apply MFA delete, versioning, and immutable snapshots to satisfy audit/compliance requirements.

Outposts acts as the hybrid control point, enforcing enterprise standards before any data leaves the country, satisfying data residency assurance while leveraging AWS Region elasticity.


4. Cost Analysis & Optimized Model

Cost Model Comparison (10 TB Example)

Option Infra Cost Data Transfer Cost Labor/Handling Total Notes
DataSync over DX (1 Gbps) $0 (infra already exists) $0 (inbound to S3 is free) Negligible ≈ $125 DataSync $0.0125/GB
Snowball Edge $1800 (fixed job) $0 High (shipping/logistics) $1800+ Higher time-to-ingest
S3 Transfer Acceleration N/A ~$800 (0.08/GB) Minimal ~$800+ Worse performance in this case
WorkSpaces $31/month + $0.28/hour + same transfer +DataSync cost Redundant $200+ Not for data transfer tasks
DX + Native S3 CLI + Hashing $0 infra + $0 bandwidth Free Moderate DevOps effort ~$0 Risk: manual retries, no E2E validation

Best Cost-Performance: DX + AWS DataSync over Private VIF


5. Final Recommendation: Enterprise-Grade Solution

Primary Recommendation: Leverage AWS DataSync over your existing Direct Connect circuit, with Outposts acting as your pre-processing and staging layer.

Architecture Benefits:

  • Low-Cost: ~90% cheaper than Snowball or S3 Transfer Acceleration.
  • Secure: End-to-end TLS + KMS + private routing.
  • High Performance: 800 Mbps – 8 Gbps, depending on DX capacity.
  • Auditable: Checksum validation, CloudTrail logs, S3 inventory.
  • Compliant: Data can be staged on Outposts to meet jurisdictional requirements.
  • Fully Automated: Zero manual retries, no human error.

6. Architectural Diagram (Mermaid)

flowchart TD
    subgraph On-prem NZ (Supplier)
        LocalStore[Source Files<br>(10+ TB Photo Archive)]
        Manifest[SHA-256 Hash Manifest]
        Outposts[AWS Outposts<br>(Local Compute + Storage)]
        DataSyncAgent[DataSync Agent<br>(EC2 on Outposts)]
    end

    subgraph Private Network
        DX[Direct Connect<br>Private VIF (1-10 Gbps)]
    end

    subgraph AWS Region (Sydney)
        S3[S3 Landing Bucket<br>(SSE-KMS, Versioning)]
        KMS[KMS Key (AU Region)]
        Report[CloudWatch Logs + S3 Inventory<br>+ Hash Verification]
    end

    LocalStore -->|Files| Outposts
    Manifest -->|Hashes| Outposts
    Outposts --> DataSyncAgent
    DataSyncAgent -->|TLS Encrypted Transfer| DX
    DX --> S3
    S3 --> Report
    KMS --> S3

Next Steps for Implementation

Step Owner Tools
Provision DataSync Agent on Outposts Cloud/Infra Team EC2 (Amazon Linux)
Create IAM Role for DataSync Agent Security Lead IAM + KMS policy
Open DX Private VIF to Target VPC NetOps AWS DX Console
Deploy S3 Bucket with Versioning, SSE-KMS CloudOps S3 + KMS
Upload Manifest to S3 Glacier Deep Archive Supplier CLI/SDK
Launch Pilot Transfer (10 GB) DevOps AWS CLI + DataSync Task
Monitor Logs, Throughput, Error Rate DevOps CloudWatch, CW Logs
Execute Full Sync CloudOps DataSync Scheduler
Archive Old Versions or Replicate CloudOps S3 Lifecycle + Cross-Region Replication

Closing Summary for Decision Makers

Decision Factor Value Delivered
Security TLS + KMS + IAM + private routing; Outposts staging for sovereignty
Cost Lowest among all options (~$125 for 10 TB)
Performance High throughput, no WAN contention
Operational Ease Fully automated, resumable, auditable
Compliance Enables staged handoff + integrity reporting
Scalability Can scale to 100s of TB with DX upgrades

Recommendation: Proceed with DataSync over Direct Connect with Outposts as edge compute. Snowball should be reserved for offline/air-gapped scenarios only.