Requirements:
- At least 5 years of experience designing and delivering scalable BI, ETL/ELT, DWH, data lake, or big data architectures
- Hands-on experience working with data services in a AWS cloud environment
- AWS Redshift (Redshift Spectrum / external tables, stored procedures, performancedriven table design (sort keys, distribution styles, …), materialized views, temporary tables,)
- AWS S3 experience (storage classes/tiers, S3 buckets, prefixes, objects versioning)
- Apache Spark 9AWS Glue or other)
- AWS EMR, Databricks, Azure Synapse Spark Pools,PySpark)
- Git and Parquet knowledge
- Proficient in both SQL and Python for data processing and analysis
- Hive Metastore (HMS; AWS Glue Data Catalog, Databricks, Apache Nifi, Presto, Apache Atlas, Hortonworks DataPlane, Cloudera Navigator, …)
Additional qualifications that will be an advantage:
- Airflow
- AWS CloudFormation
- Ansible
- Azure Resource Manager
- Chef
- GCP Deployment Manager
- Terraform
- AWS CodePipeline
- Bitbucket Pipelines
- GitHub Actions
- GitLab Pipelines
- Jenkins
- TeamCity
- Travis CI
- AWS Glue
- AWS Lambda
- Azure Functions
- Google Cloud Functions
- Azure Synapse
- Databricks
- Google Cloud BigQuery
- Snowflake
- AWS Step Functions
- dbt
- Delta Lake,Apache Iceberg or Apache
- Hudi
- Hadoop File System (HDFS)
- Scala
- Data Lakehouse
- Data Governance
- Data Quality
- Data Lineage / Data Provenance
- Streaming Data / Real-time Data
- Star Schema / Dimensional Modelling / Kimball
- Data Vault
- Common Data Model / Corporate Data Model
- Master Data Management
We offer:
- B2B contract with rate up to 210 PLN NET/H
- 100% remote job
- Additional benefits
- Innovative working environment
- Opportunity to develop professional skills