Pyspark - Data Architect

Virtusa • dubai, dubai, United-Arab-Emirates • Posted June 18, 2026

Location dubai, dubai

Job Type Full-time

Category Other-General

Posted June 18, 2026

                Responsibilities Data Pipeline Development: Design, develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity. 
Ingestion: Implement and manage data ingestion processes from a variety of sources (relational databases, APIs, file systems) to the data lake or data warehouse. 
Transformation and Processing: Use PySpark to process, cleanse and transform large datasets into meaningful formats that support analytical needs and business. 
Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL. 
Quality and Validation: Implement data quality checks, monitoring and validation routines to ensure data accuracy and reliability throughout. 
Orchestration: Automate data workflows using tools like Apache Oozie, Airflow or similar orchestration tools within the Cloudera environment. 
...
            

Interested in this role?

Click the button below to start your application.

Apply Now