We are living in the industry where data is necessary for everything, thus, related-data jobs are extremely hot now. One of the most famous jobs on the market which is chosen by many students is data science. Since I am also studying Data Science at my university now, I am going to share what you will learn if you choose to study this field.
1. The differences between 3 big heroes – Data Analyst, Data Scientist, and Data Engineer – in the tech industry:
In the data science field, there are mainly three different roles such as data analyst, data scientist, and data engineer, representing from the low level to the high level respectively.
- Data Analyst is an entry-level position in the data science field. This role mainly focuses on three general things such as accessing data, performing data, and presenting data from the business point of view. Accessing data is the first stage in which you need to come up with ideas on how data is collected and which kinds of data are the best fit for a particular business scenario. Then, you need to apply some statistical analysis to those data to find the answer that is highly relevant to business questions. Finally, you need to create stories for your findings, visualize and communicate them to stakeholders, who is in charge of the business.
- Data Scientist is the intermediate-level position in the data science field. This role mainly focuses on building machine learning models or designing statistical algorithms to generate accurate predictions. Normally, this role requires you to have at least 3 years of experience in the field. Furthermore, the evaluation part is extremely important, thus, you need to spend much time experimenting and to come up with innovative ideas that benefit the business the most.
- Data Engineer is the upper-level position in the data science field. This role mainly focuses on building the data infrastructure. You can imagine that this guy is who owns 60-70% software development skill and 30-40% statistical analysis skill. Normally, this role requires you to have at least 3 years of experience in the field. Furthermore, this role is more responsible for building data pipelines such that it boosts the performance of the process of collecting data, performing data, and reporting data.
2. What you will learn in school:
There are several things that you will learn in school at the beginning. Definitely, they do not teach you things that are too technical like programming. Instead, you will teach you a lot more about statistics like counting, sum, probability, statistical description, and so forth. Furthermore, you will learn how to use services such as IBM Statistics – one of the most popular services in Australia – to perform statistical analysis in their collected data. Also, they will teach you how to report your findings in the appropriate statistical format. Once you are more familiar with all the basic terms of statistics, you will learn a bit more about collecting data via surveys or other resources and computing the data. For the computing part, they will teach you how to do some basic programming in R, Python or Matlab – this mostly depends on the school you are studying or plan to study. When you start being familiar with both things above, there are two paths for you – you can choose either one or two. One of them is that they will group you with other students into a group and you guys will need to apply your knowledge to the statistical project. Another one is that you will individually go a bit deeper into some popular statistical algorithms in both data mining and multivariate analysis topics.
This is optional. If you actually choose to learn Data Science, I highly recommend you to choose some related-programming units. This is because it might help you to make more sense of how you will play around your data. Importantly, the programming skill is necessary for three big heroes in section 1.
3. Skills that you need:
I guess you already know which skills you need by reading through two sections above. However, I want to add more in this section.
- Communication – if you cannot present or simplify your findings to stakeholders, your findings are useless. Therefore, this skill is the most important.
- Collaboration – you might need to team up with many people who are responsible for different parts. Thus, collaboration is also important.
- Being creative and curious – data is only the data if you do not know how to transform it into a good story. Therefore, you will need to be creative and always ask questions about the data.
- Being patient – in order to find the best fit for a particular scenario, you will need to do a lot of experiments. Importantly, you might need to do one experiment repeatedly with different cases. Therefore, being patient is an important factor that motivates you a lot.
- Statistical analysis – not only dealing with different complicated models or statistical algorithms but you also need to understand clearly the data. When you actually understand the data, you will be more confident to apply the statistical models.
- Research thinking – you will need to read a lot to update what is going on in the industry. Also, you will need to think independently from the researcher’s point of view to find solutions for a particular related-data issue.
- Programming – using services is good at the beginning. However, when you deal with big data set or more complex statistical models, you will need to have a programming skill to implement the models in different infrastructures to gain better performance.
- Technical Writing – reporting your findings is extremely important because it helps other teams, or experts in the industry, understand what you are trying to do or doing. Therefore, you will need to know how to make a clear technical report based on facts and statistical format.