Data flow
Last updated:
AZUREDATA
- Data flows in [[202404271503 Extract Transform Load|ETL]] or [[202404271503 Extract Transform Load|ELT]] form.
- Azure Data factory is a solution to give this control
flowchart LR
Source --> Extract --> Transform --> Load
flowchart LR
Source --> Extract --> Load --> Transform --> Load2
Data source
Data source is either in:
- Batch
- Data is gathered somewhere and then processed in one go.
- Interval processing
- Large volume of data
- Some latency will be there
- Serial
- Data keeps coming. We process it as it arrives
- We will usually have some hub which collects it and then passes it along. like IoT hub, etc.
- Low latency
ETL
- Extract Transform Load
- ETL used to be in earlier days when storage was costlier
- So we would transform the data and then go to load stage
- Transform is either
- Mapping
- Wrangling
- Complex (HD insights, etc)
- Load is stored in DB, or [[202404261931 Azure Data warehouse and analytics]]
- And then finally analyze phase
ELT
- Storage is cheap now.
- So we store it in something like [[202404121149 Azure Data Lake]]
- The benefit is that in future we may want to transform it in some new way
- But if we transform like in ETL then that data is lost and we can’t do anything
- Now we have loaded it in [[202404121149 Azure Data Lake]] so we can use it later as needed