Apache Spark is a type of technology that uses distributed systems. In this article, we explain what it is, the key concepts to keep in mind, and provide guidance to help you start using it easily. What is Apache Spark? Apache Spark is a technology that employs...
In this article, we explain how to create a minimal test plan in data engineering. We discuss the importance of ensuring process quality with a detailed and documented test plan: what, how, and why to test. Is It Possible to Code Without Errors? We know it’s...
In this article, we explain how to build a modern data architecture with AWS. Additionally, we describe the ecosystem of services it offers for Data Analytics projects. We cover each stage: storage, ingestion, transformation, exploitation, and data visualization. How...
In this article, we explain what a modern data architecture is, what its layers and components are, and analyze the pros and cons of both the data warehouse and the data lake. What is a data architecture? Data architecture is a combination of technologies that...