好用的手机公司, 新年大促销: 3 month FREE! $15/Month 5GB data and text

Author Topic: Timeline of building a datawarehouse  (Read 16525 times)

jingzzsaccount

  • Core Package
  • Newbie
  • *
  • Posts: 17
  • Karma: +4/-0
Timeline of building a datawarehouse
« on: April 07, 2013, 10:11:31 AM »
Hi Laotulaoshi and experienced BI classmates,

I have started work for 4 weeks. I like my coworkers and I like my work. I have been familiarizing my company's database for the past 4 weeks. Now, they want me to build OTAP for them. I need to come up with a timeline next week. The problem is that their OLTP data are really dirty, there are a lot of bad data (e.g., duplicate records, lots of typos, missing values, some attributes have inconsistent reference integrity) and I am not sure how much time is needed to clean the OTLP. Also, I am not sure how long it will take to obtain all the business requirements for different users. Whether data cleaning and meeting with business users should be conducted parallelly or sequentially? The company also have a lot of server problems, there has been a lot of disruptions in accessing to programs and files at work, which makes it more difficult to predict how long it will take to complete the first cube.
I would appreciate if any of you can share your experience.

Thanks in advance,
Jing 

guoz100

  • Core Package
  • Jr. Member
  • *
  • Posts: 66
  • Karma: +1/-0
Re: Timeline of building a datawarehouse
« Reply #1 on: April 08, 2013, 04:26:36 AM »
Whether data cleaning and meeting with business users should be conducted parallelly or sequentially?

Well, that depends on the priority. And the priority is decided by your boss. If a user has a reporting project with coming deadline, then it is more important to meet with the user, clean the data for his report first, then finish the project before the deadline. So the first thing I will do is to collect all the deadlines from different users and different projects. Then give the deadline list to my boss. Let him know that there are a lot of dirty data in the database and we need to clean those tables first to produce the reports. Let him know how long I think it needs to clean the dirty data. (You can estimate the time by doing some test.) Then let me decide what should be done first. Ask him to decide whether you should do it sequentially or parallely.

If there is always connection disruptions from the server side, then you need longer time than you estimate to clean the dirty data, you also need to let your boss know. He needs to know that it is not your fault or it is not due to your efficiency. Then give him an updated estimated time for you to complete the work. Ask for extension if there is no way aviod it.

Basically, don't panic. So far I don't see anything that is your fault.

dandan2

  • Core Package
  • Newbie
  • *
  • Posts: 14
  • Karma: +2/-0
我做的研究Re: Timeline of building a datawarehouse
« Reply #2 on: April 08, 2013, 07:31:18 AM »
推荐Jingzz读以下的DW in 4 Steps Link:
http://dwjunkie.wordpress.com/2011/06/07/a-data-warehouse-in-4-steps/
有用的地方很多,如:
1)Dimension Modeling 就像 Mind Mapping, 要arrange WORKSHOP with Business dept users first!!! (这里说的不是一个电话,一个会议能搞定的,要WORKSHOP,就是全方位的多layer的跟users套,把他/她们的所有要求(将来要分析的needs全弄懂)。所以说BI=MBA+DBA一点也不为过!不要看Source system(不被它牵制), focus on how do I want it to be????
2)Star schema generation: ERD 驾到!a Good Point: Do a PROTOTYPE for business users. so they can look at their Data Warehouse at an early stage, (keep them involved!) 这样你跟老板汇报时,就如鱼得水了
3) Data MAPPING (contrast to step 1 Mind Mapping), 一定记住要参考Kimball group's DDD Worksheet(Detail Design Dimensional Worksheet) 哟,哇,所有你想到的,想不到的全给你了(每一个sheet是一个table, see attached example)
This step is most difficult: b/c: diff.name could mean same things; diff data type; data on diff level of aggregation; data need to be calculated; data may not in the source system......(你把所想到的困难全列出来,问问小组其他人,每个困难大概要多长时间,then you will get an estimate of time frame. (side note: Even when Consultant quote client's job, they need to follow company timeline sheet, so check with your company/director/coworker, see if they have a timeline idea (or better yet documentation),  for steps that you come up...)
4) Build your LEGO: cubes and reports, step3 和 4 可以parallel。


jingzzsaccount

  • Core Package
  • Newbie
  • *
  • Posts: 17
  • Karma: +4/-0
Re: Timeline of building a datawarehouse
« Reply #3 on: April 08, 2013, 05:41:00 PM »
Thanks Dandan and Guoz for your suggestions!

DanDan- thanks for the link, you are very resourceful! I will study the topic more carefully.

Happy Monday!
Jing

s2012

  • Core Package
  • Newbie
  • *
  • Posts: 7
  • Karma: +0/-0
Re: Timeline of buildidatawarehouseng a
« Reply #4 on: May 22, 2014, 08:18:26 PM »
哪里下载“detailed dimensional design worksheet” provided by the Kimball Group?

谢谢!