Reading time: 4 minutes
Data integration from multiple sources in Power BI
27 / 11 / 2019
There are many ways to store data about how an organization works. A typical approach to recording transactional data in an enterprise is to implement a database in which structured data with a specific structure will be stored. By storing data in tables, you can control the method by which it is are stored and establish relationships between other objects so that the data is in the right business context. A well-designed database offers consistency and integrity in the information it contains but requires the user to enter the data to match its models.
At various stages of development, companies decide to implement the tools to store information that the means available in a given period allow. Considering costs and functional criteria, you can choose from several available database engines, including:
- Microsoft SQL Server
However, not all data in an organization is always stored in the database. Despite the presence of many information management systems in an enterprise, sometimes it is more convenient and faster to save some data in the form of an Excel spreadsheet or a PDF file. When deciding on this form of information storage, we usually intend to meet our current short-term needs. For example:
- export of contacts from a Google Account or Microsoft Outlook
- downloading data about customers of our Internet store as a CSV file
- saving an Excel file with sales data received via email and saving it on a network drive (eg. Onedrive)
Data stored in this way can be large and important, it can serve as a supplement to the data stored in a database. In seeking to prepare an analysis of stored data, it may turn out that they both contain similar information, but the structure and specificity of each storage place is different. Every transformation of data to fit into a common model can be overly laborious and insufficiently effective, which can hinder the process of reaching the right information.
Power BI as a data integration tool
The strength of Power BI is its capacity to integrate data from multiple sources of different types. This extensive tool for building interactive analyses allows you to create a common data model and aggregate data from many sources (including):
- Microsoft SQL Server
- IBM Db2
- SAP HANA
• ONLINE SERVICES
- Websites with data
- SharePoint and OneDrive
- Active Directory
- Exchange Online
- Google Analytics
- Facebook sites
- NAV Dynamics
- Excel spreadsheets
- Text and CSV files
- XML file
- JSON file
Power BI allows you to connect to multiple data sources at the same time and design a data model for business analysis. You can add new sources using the Power Query editor built into Power BI. They will be converted into queries, which become templates for tables that will be filled with data after loading them from other sources. When establishing a connection, Power BI recognizes data objects and allows you to select elements:
Then, imported objects are available on the Power Query editor query list, which are transformed according to a set pattern of behavior. At this stage, Power BI allows you to create a set of data conversion operations for the selected query. Thanks to this, each time data is downloaded from sources, the input data will be loaded into the model and transformed step-by-step according to the designed scheme.
Importantly, during data processing, Power BI does not interfere in the structure of the source it uses but performs operations on its own local data set downloaded within the scope set at the extraction stage, i.e. extracting information. Then, it transforms the data by indexing it, changing the data type in the selected column, merging or display order. After designing, the data is loaded into the model in the form of data tables. This feature makes Power BI a great tool in situations where you do not want to make changes to the original source, but to load to it to another place and transform it. This feature is important, especially when our sources differ in the technology on which they are based and/or data structures that they present.
To help you navigate the data sources available in the model, Power BI allows you to determine the origin of data objects. The query dependency view allows you to locate which of our databases, files or internet sources the query will use to create the table and which element in the hierarchy is to be loaded into the data model.
By aggregating data from many sources, reports prepared using Power BI can cover a wider scope than just a single database, thereby allowing you to prepare more comprehensive business analyses. The authorization data for the data sources is saved in the data model and is automatically used to establish a connection when one data loading procedure is called. As a result, data that was previously available in many sources can be easily combined and downloaded to a common set of data processed in the Power BI cloud service or saved locally on the device using the Power BI Desktop application.