본문 바로가기

카테고리 없음

Introduction To Data Warehousing And Business Intelligence Pluralsight

  1. Pluralsight Sql
  2. Introduction To Data Warehousing And Business Intelligence Pluralsight Online
  3. Pluralsight Ing

Moving data with SSIS is a forgone conclusion for most of us — it is part of SQL Server, it has a interface we are somewhat familiar with, and connects to lots of common data sources like csv, excel, and of course SQL Server. Are our needs changing as our environments become more hybrid — as we introduce cloud services?

Are you still building packages the same way you were 10 years ago? Just as our data sources have grown, the capabilities needed to move data around have increased.

This episode, with Jason Horner and Cathrine Wilhelmsen, challenges some of the thoughts around data movement and gives some things to think about as new tools are available to move data around. For all of us working with data, we know that we work no only with SQL Server but the suite of tools around the SQL Server engine. In this episode, we share some thoughts around SSIS administration, which is the most popular ETL tool in the Microsoft stack. Just like the SQL Server engine, many times the defaults are taken and in this episode we give some thoughts around logging, package execution, and notification with our guest Ravi Kumar. With the every expanding functionality of SSIS, Steve and I found this discussion insightful as we think about supporting an SSIS environment and ways we can educate package developers. Episode Quote“ That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster.

” Listen to Learn. Why you want to consider SSIS Buffer sizing. The differences between package deployment vs project deployment. The benefits of package deployment.

The various logging types for SSIS packages. Two parameters to tweak for SSISDB cleanup. The validations steps a package goes through for execution.About Ravi KumarBesides being a former SQL Cruiser, Ravi is the BI architect for LMO advertising in Baltimore Maryland.

He is the co-leader of the BI PASS chapter and frequent contributor to SSIS related discussion on the east coast.Untranscribed introductory portion.Carlos: Okay. Ravi welcome to the program.Ravi Kumar: Thank you sir.Carlos: It’s great to have you. One of the things that you’ve been talking and are doing I guess professionally is working quite a bit with SSIS. I guess shame on us. We’ve never actually had a topic on SSIS in the history of the SQL Data Partners podcast. I don’t know why its been that long but I thought that you would be a great person to bring on and talk a little bit about From the DBA perspective what we need to know about SSIS.

Data

Some of those moving parts and maybe some of the catches that people can kind of get to get stuck with. Why don’t you start us off and talk a little bit about, can you give us the overview of SSIS and what people are mostly using it for.Ravi Kumar: SSIS is an enterprise-level tool that has to do with moving data in an out. It’s an ETL to extract transform and load the data. People have been using it for a while. It’s been around since 2005. Before 2005 back into SQL server 2000 it was called DTS. It became a lot more interactive and enterprise-level 2005 and have been improving it ever since then.

Most of the organizations use it for moving there in and out and to load the data warehouses and to do migrations from maybe Oracle to SQL server or some inaudible 00:02:14 to server or SQL server to some other data base.Currently how we use it, we have at our organization data coming from probably 15 different sources and it goes out to about 5 different other companies. We ingest all the data, we do manipulations for that. We apply business logic to it and it gets sent out. We have certain processes that bring in data every 5 minutes, 24/7. It could be SSIS, it could be very minor where you have 1 or 2 processes running a month and it could be really highly skilled process, where it’s running 24/7 constantly.Steve: Okay. Ravi could you share with us a little bit about your background and how you got started and then how you’ve ended up working with SSIS today?Ravi Kumar: Sure.

I started as a SQL database developer back in 2004 I think. Then it slowly moved in to BI world. Business intelligence usually is 3 parts. As reporting, which is SQL server reporting services, our data warehousing which is SQL server and inaudible 00:03:31 services. Then moving the data ETL, which is a SQL server integration services. After I started with BI, I started with ETL, moved to data warehousing, moved to reporting, moved back to data warehousing and reporting, now I’m back to reporting and integration services again. It’s the ETL part.

Currently I’m a BI manager and I manage a team of 3 other developers who exclusive work on SSIS.Carlos: 1 additional area in thinking about SSIS and some of the ways you can use it there, is it gets baked in. It’s actually kind of baked into SQL server management studio in a sense as well. Even something as simple as I want to export an Excel spreadsheet into a table or move data from one database to another, I can They have that, you can right-click import export data. There was 1 for import 1 for export. You actually go to the process and that will actually create a little SSIS package that doesn’t get stored anywhere or saved unless you take that option. Folks can actually use that to move data in and outRavi Kumar: Again, SSIS comes with a SQL server right? You’ve got a SQL server licensing, standard or enterprise it comes with it.

Once you go into that option it actually creates an SSIS package in the background and executes it and then gets rid of the package. You can save that and execute it again and again and again if you wanted to. Actually I use it for the same thing. Let’s say if somebody says “hey, I just want the data from this table to go to this table and I want to do it every week”.If I try to manually create something like that, it’s going to take me maybe 20 minutes where it says if I go through SSIS and go through the visceral, it could take me 3 minutes and after that I automate it and I’m done.

It has it’s place but it’s a really really small part of SSIS.Carlos: Sure. Fair enough.Ravi Kumar: Yes.Carlos: As we start using those safe packages and we want to put them into our processing or into our systems, there now are 2 modes or 2 ways in which we can deploy or save those packages.Ravi Kumar: Right. Now we’re starting to talk about package deployment vs. Project deployment so there was only 1 way to deploy package up until single Server 2008, R2. In 2012, Microsoft introduced Integration Services Catalog.

When you’re developing a package and visual studio, usually you have, starting 2012 you can have package or project. Package that is basically 1 unit of subset.

Pluralsight Sql

It’s basically in the background as an XML file. You can just deploy it by itself and you can use SQL agent to automate just 1 package itself.There are lots of benefits that you don’t get using package deployment model because it’s just one package.

Introduction To Data Warehousing And Business Intelligence Pluralsight Online

If you are looking at SSIS from a developers point of view, your looking at Let’s say you need to download one file, put it in your database, do some transformation on it and do an export out of it. If you’re good developer, how you would develop it is you would have 1 package, just download the file. 1 package just to load the data, 1 package just to transform it somehow, 1 package just to export the data into a file.If you are using a package deployment model you would have You can still do it but since there’s different different sub sets of it, it just makes us to just to just deploy everything. Let me go over some of the advantages of project deployment. First of all, if you’re using package deployment, you’re going to have store passwords somewhere because you going to move from dev to stage to production right? You got to have some sort of account that needs to access different resources, maybe some file systems.

Pluralsight Ing

You’re going to have to put that password somewhere. What a lot of people did, they use a configuration file up until 2008 and they put that password in plain text within the file which is a big security hole.Carlos: Sure.Ravi Kumar: Once you deploy to catalog you can have those passwords or those sensitive information into a parameter and then you can change it directly on the server and they can deploy from server to server. Let’s say you deploy from staging to production. The first time you change it on production, it will remember that next time so you never have to change it again.

A data warehouse is a database designed to enable business intelligence activities: it exists to help users understand and enhance their organization's performance. It is designed for query and analysis rather than for transaction processing, and usually contains historical data derived from transaction data, but can include data from other sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources. The key characteristics of a data warehouse are as follows:.Data is structured for simplicity of access and high-speed query performance.End users are time-sensitive and desire speed-of-thought response times.Large amounts of historical data are used.Queries often retrieve large amounts of data, perhaps many thousands of rows.Both predefined and ad hoc queries are common.The data load involves multiple sources and transformations.In general, fast query performance with high data throughput is the key to a successful data warehouse. There are important differences between an OLTP system and a data warehouse. One major difference between the types of system is that data warehouses are not exclusively in, a type of data normalization common in OLTP environments.Data warehouses and OLTP systems have very different requirements.

Here are some examples of differences between typical data warehouses and OLTP systems:.WorkloadData warehouses are designed to accommodate ad hoc queries and data analysis. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query and analytical operations.OLTP systems support only predefined operations. Your applications might be specifically tuned or designed to support only these operations.Data modificationsA data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques.

The end users of a data warehouse do not directly update the data warehouse except when using analytical tools, such as data mining, to make predictions with associated probabilities, assign customers to market segments, and develop customer profiles.In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.Schema designData warehouses often use partially denormalized schemas to optimize query and analytical performance.OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency.Typical operationsA typical data warehouse query scans thousands or millions of rows. For example, 'Find the total sales for all customers last month.' A typical OLTP operation accesses only a handful of records. For example, 'Retrieve the current order for this customer.'

.Historical dataData warehouses usually store many months or years of data. This is to support historical analysis and reporting.OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.

Figure 1-1 Architecture of a Data WarehouseIn, the metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data. Summaries are a mechanism to pre-compute common expensive, long-running operations for sub-second data retrieval. For example, a typical data warehouse query is to retrieve something such as August sales.

A summary in an Oracle database is called a.The consolidated storage of the raw data as the center of your data warehousing architecture is often referred to as an Enterprise Data Warehouse (EDW). An EDW provides a 360-degree view into the business of an organization by holding all relevant business information in the most detailed format.