For those of you who want to work in Data Science, the main knowledge that we need to know is statistics knowledge. And programming knowledge
Nowadays, we have a wide variety of programming languages to choose from.Besides Python & R, which we talk about a lot, administrators find that the more we learn, the more languages we need to use. People who are studying Data Science
This list is the languages that I have been using (or been forced to learn T_T) over the past year.The comments here are based on real experiences that some languages have been using for a few months. Let’s see what
1 1) Python
1.1 Advantages of Python:
2 2) R
2.1 Advantages of the R language:
3 3) Unix
4 4) Java / Scala
5 5) SQL
6 6) Hive / Pig
8 Also, are there any other data science languages?
9 What language do you want to do in Data Science?
The name Python means Python. (But actually, the name comes from a British comedy group called Monty Python, where the people who created the language are fans. Thank you Mr Theerapong for the information.)
Python was the first language I added when I started learning Data Science, because at that time I took the Algorithms and Data Structure course.
Advantages of the Python language:
Easy to understand language Suitable for beginners
It is very suitable for Data Science as it has good packages like Pandas (for Data Wrangling), Scikit-learn (for making Machine Learning Models), Tensorflow (for Deep Learning).
The Data Engineer line is very useful, including PySpark that allows us to connect to Spark on the Hadoop Cluster, as well as Airflow that we use to run Big Data in Python.
Larger companies prefer to make Python libraries available. Because it is a very flexible language that can be used in various ways More useful to many parts of the company
Not much different from other programming languages As a result, people from the line of programmers prefer to use Python.
Can be used for various purposes Can be used to make websites, write bots, work orders, etc. Therefore, it can be easily connected with other systems.
Recently, the admin just watched a video about doing Data Science at Stripe, a famous foreign payment gateway company, using Python as a whole team. Because you can take it to Production
If you are interested in learning Data Science with Python, recommend this >> Intro to Data Science by Udacity video course.
R is a language that I later learned from Python because I went to a course called Modeling for Data Analysis and the whole course code was all in R language T_T.
Advantages of the R language:
Easy to understand language Suitable for newbies who want to do Data Science rather than Python because many statistical commands are built-in and do not need an additional package.
The program for writing R is RStudio, which is really good.It has a code window, a variable window, a console, a plot window, all in one screen, so good that someone developed a program like RStudio for Python.
There are many great supplementary libraries, whether ggplot2 is the best library for current data visualization, or making machine learning models, there are libraries ready, and there is also a great package called Tidyverse that I previously wrote.
Microsoft currently supports the R language fully, after purchasing the R developer company released Microsoft R Open, an R version that supports super-fast processing with Multithreaded.
For those who are interested in learning R, we recently introduced 6 free basic R learning resources for beginners (Video, E-Book English & Thai) together last week.
Unix refers to the black, green screen that we see a lot in hacker movies. Even though there are many beautiful GUI programs out there nowadays, there is one thing that beautiful programs can only fight black screens with only text, that is, “speed” in managing data files.
Now, now we have something called Big Data or massive amounts of data. Which has a lot of information until using any program to open and freeze We can also take advantage of Unix’s speed to open, read / edit these files.
For anyone who wants to study Unix, it is not difficult. You can go read the Facebook post that I had told you to list all the Unix commands required for the data cable
4) Java / Scala
The next problem with Big Data is that it is a lot of data that conventional databases cannot store and process (Data Processing) is not at all, there is something called Hadoop that happens to support a large amount of data. It was built on the idea of MapReduce that Google had previously revealed.
And because Hadoop is built on Java, Data Engineer has to use Java to talk to it. However, Java is a language with many rules for writing. And something that can be easily written in other languages has to be written in Java.
After that, a language called Scala was developed, a fully functional
l programming language with Java. But can be written shorter and work Efficient than
The reason why Scala is increasingly being used in Data Science (Stripe, mentioned above, uses Python, is also using Scala) is because companies today prefer to use Apache Spark based on Hadoop to process data, and Scala can Working with Spark is faster than using Python or R connected to Spark.