17. What is Data profiling?
Data profiling is also known as data assessment or data archeology. It is a process of checking the data available in a data source. Statistics and information is collected from the data.
The main objective of data profiling is to check the correctness, consistency and completeness of data and ensure that it is free from any data anomalies. It also checks if the data follows business rules and find out if there is any violation of rules.
18. What are the steps involved in data profiling?
Data profiling is one of the important processes which is very necessary to identify the problems before they can impact the decision making process.
The steps involved in data profiling are as follows:
i.) Prepare a document containing the timeline, deliverables and the boundaries of the project. The document will help the employees know the expectations and requirement and will help them to prioritize their work.
ii.) Select an appropriate analytical and statistical tool which will help you to outline the quality of data structure.
iii.) Analyze the data sources.
iv.) Know the scope of the data.
v.) Identify the difference between patterns and the format of the data.
vi.) Identify irrelevant values, duplicate values, missing values, and other anomalies present in the data source.
vii.) Check and analyze business rules.
19. What is time series analysis?
Time series analysis is used to forecast the outcome from a given process. This is done by analyzing the earlier data using methods such as log-linear regression, exponential smoothening etc. This analysis is done in two domains,
i.) Frequency domain
ii.) Time domain
20. What is the process of data analysis?
Data analysis is a process which helps you in decision making. Here are a few steps to perform this task:
i.)
Define your goal: Data analysis is a lengthy and time-consuming process. It is essential that you do not waste time in collecting wrong or irrelevant data. Knowing the objective of data collection will make the process easier for you.
ii.)
Set your measurement priorities: Decide what you want to measure and how you are going to measure it. Decide what factors you want to include, what time frame you are looking at. This needs to be done before the data collection.
iii.)
Collection of data: Now when you have decided on what you have to measure, it's time to collect the data. As you collect the data make sure it is organized. You may use surveys, or interviews or you may also use past records to collect the required data.
iv.)
Data Scrubbing or cleansing: Data scrubbing is a process where you remove any incorrect, duplicate, incomplete or redundant data. Data scrubbing is very important to get clean and quality data.
v.)
Analyze the data: To analyze the data you can use different methods such as data mining, data visualization, business intelligence, etc. There are various tools and software available that are extremely helpful.
vi.)
Interpretation of results: This will help you to answer your questions and help you achieve your goal. It may also tell you if more information is required and further research needs to be conducted. The data needs to be interpreted correctly so that it can help you make correct decisions.
21. Which data analysis software are you comfortable working with?
Data Analyst needs to have good computer and technical skills. They need to have knowledge about statistical language and scripting language. Also, they need to have a good understanding of reporting and data visualization software. Knowing the advanced MS Excel software also is very beneficial for a data analyst.
With this question, the interviewer is trying to understand what software are you well versed with and which areas you might need training. To answer this question you might say: "I am good with computers and have an experience with SQL programming language. I am well versed with the functions of MS Excel and I am currently using ABC software for data visualization at my current employer. If there is any specific software you are looking at I'll be happy to learn it. I'm a quick learner."
22. What characteristics define the quality of data model?
The characteristic features that define the quality of data model are as follows:
i.)
Accuracy: The data must convey the right message without being misleading.
ii.)
Reliable: The data must be consistent without any contradictions.
iii.)
Timeliness: The data must be collected at the right time as it needs to serve the right purpose.
iv.)
Completeness: The data must be complete as gaps in data will not give the correct picture.
v.)
Availability: The data must be available and access must be granted to the analyst to perform their duties.
vi.)
Uniqueness: The data collected must have distinctive properties which should be unique.
23. Tell us about a time when you could not meet the deadline.
Data analyst needs to adhere to the given timelines. The timeline is predefined and agreed by the analyst. But at times they are unable to meet those timelines. The interviewer, with this question, is trying to understand how well you can handle these types of situations. He /she would like to know if you are able to find a solution.
To answer this question you can say something like: "When working on a project where I had to look for data, I was facing a difficulty to get information from a certain source. I proactively contacted the client and explained him the problem and what measures we are taking to get the correct information. I explained to him that this was a bit more time-consuming process. The client understood the situation and we were able to extend our deadline by a week."