7 programming languages that data scientists must know

For those of you who want to work in Data Science, the main knowledge that we need to know is statistics knowledge. And programming knowledge

Nowadays, we have a wide variety of programming languages ​​to choose from.Besides Python & R, which we talk about a lot, administrators find that the more we learn, the more languages ​​we need to use. People who are studying Data Science

This list is the languages ​​that I have been using (or been forced to learn T_T) over the past year.The comments here are based on real experiences that some languages ​​have been using for a few months. Let’s see what

Contents hide
1 1) Python
1.1 Advantages of Python:
2 2) R
2.1 Advantages of the R language:
3 3) Unix
4 4) Java / Scala
5 5) SQL
6 6) Hive / Pig
7 7) JavaScript
8 Also, are there any other data science languages?
9 What language do you want to do in Data Science?

1) Python
python-data-science.jpg
The name Python means Python. (But actually, the name comes from a British comedy group called Monty Python, where the people who created the language are fans. Thank you Mr Theerapong for the information.)
Python was the first language I added when I started learning Data Science, because at that time I took the Algorithms and Data Structure course.

Advantages of the Python language:
Easy to understand language Suitable for beginners
It is very suitable for Data Science as it has good packages like Pandas (for Data Wrangling), Scikit-learn (for making Machine Learning Models), Tensorflow (for Deep Learning).
The Data Engineer line is very useful, including PySpark that allows us to connect to Spark on the Hadoop Cluster, as well as Airflow that we use to run Big Data in Python.
Larger companies prefer to make Python libraries available. Because it is a very flexible language that can be used in various ways More useful to many parts of the company
Not much different from other programming languages As a result, people from the line of programmers prefer to use Python.
Can be used for various purposes Can be used to make websites, write bots, work orders, etc. Therefore, it can be easily connected with other systems.
Recently, the admin just watched a video about doing Data Science at Stripe, a famous foreign payment gateway company, using Python as a whole team. Because you can take it to Production

If you are interested in learning Data Science with Python, recommend this >> Intro to Data Science by Udacity video course.

2) R
R is a language that I later learned from Python because I went to a course called Modeling for Data Analysis and the whole course code was all in R language T_T.

Advantages of the R language:
Easy to understand language Suitable for newbies who want to do Data Science rather than Python because many statistical commands are built-in and do not need an additional package.
The program for writing R is RStudio, which is really good.It has a code window, a variable window, a console, a plot window, all in one screen, so good that someone developed a program like RStudio for Python.
There are many great supplementary libraries, whether ggplot2 is the best library for current data visualization, or making machine learning models, there are libraries ready, and there is also a great package called Tidyverse that I previously wrote.
Microsoft currently supports the R language fully, after purchasing the R developer company released Microsoft R Open, an R version that supports super-fast processing with Multithreaded.
For those who are interested in learning R, we recently introduced 6 free basic R learning resources for beginners (Video, E-Book English & Thai) together last week.

3) Unix

Unix refers to the black, green screen that we see a lot in hacker movies. Even though there are many beautiful GUI programs out there nowadays, there is one thing that beautiful programs can only fight black screens with only text, that is, “speed” in managing data files.

Now, now we have something called Big Data or massive amounts of data. Which has a lot of information until using any program to open and freeze We can also take advantage of Unix’s speed to open, read / edit these files.

For anyone who wants to study Unix, it is not difficult. You can go read the Facebook post that I had told you to list all the Unix commands required for the data cable

4) Java / Scala
The next problem with Big Data is that it is a lot of data that conventional databases cannot store and process (Data Processing) is not at all, there is something called Hadoop that happens to support a large amount of data. It was built on the idea of ​​MapReduce that Google had previously revealed.

And because Hadoop is built on Java, Data Engineer has to use Java to talk to it. However, Java is a language with many rules for writing. And something that can be easily written in other languages ​​has to be written in Java.

After that, a language called Scala was developed, a fully functional

l programming language with Java. But can be written shorter and work Efficient than

The reason why Scala is increasingly being used in Data Science (Stripe, mentioned above, uses Python, is also using Scala) is because companies today prefer to use Apache Spark based on Hadoop to process data, and Scala can Working with Spark is faster than using Python or R connected to Spark.

This period

Programming 101

This article we wrote a long time since 2014 and today we would like to dust it. Let’s rewrite it. 5 more years of experience It should make many people get more.

How do I start writing a program?

It’s a question that we get asked almost every day. Many people want to jump into programming. I want to know more about computers. Today we will tell you that If we want to be able to write programs and be a good programmer, how do we start?

Programmer is not just a programmer. But as a problem solver
According to this topic title We might think Programming is The fact that we know any alien language is unknown in the eyes of others. And write it out, nothing is confusing We would like to say Really sitting and writing code is just a small fraction of the work. Because of these functions, there are many more things to write in code as anyone can see.

If compared to building a main house Programming It should be a real construction process. But before we can build It goes through a lot of steps, from design, to prove, to build, to finding a lot of materials. Which these steps are That should take time Programming is the same As we can see, we sit in the code together, not most of the time. Most of the time is spent planning and thinking about how we write it out better.

So, don’t be mistaken, programmers are only born to write programs and then finish. Before becoming a program that he can sit and write It takes a lot of steps. Make the first step of learning to write a program, not programming at all, but solving problems first, try starting with simple problems in daily life, for example, if we are going from A to B, how do we have to travel? They will be of great help.

Basic Concept Who says it doesn’t matter?
If it was us in elementary school, we would argue that Just write it out. But now we know a lot more. I would like to tell myself at that time to pay attention to studying mathematics a lot, because growing up, you used a lot, right, seriously. Studying computers in various fields is like learning mathematics. But only applied mathematics

Personally, we still see that It’s just that you can program it. It’s not just typing the code into it, but it requires knowledge of many other things. Many people who study computer like to tell us why study it, take the time to do other things better than seriously, I want you to think well. For a little while, we are fortunate that the faculty has arranged time to study. Makes us see that everything is connected Whether we work as programmers Or do something else on the computer So we know that The basic things in computers are very important.

Very simple, if we want to learn Algorithm, it is not just for us to write. That one who reads can write, but ask how do we know that this method is really the correct way to get the answer, so mathematics came together a lot when studying, almost unable to survive. at all

Or if we want to learn Network easily, for example, if we are going to send data to computers in Network asking how we can manage it, it is the Routing Algorithm, yes, it is everywhere in the computer. No electricity is allowed to flow through the circuit to calculate something. Really everywhere

Extremely important language
Besides the computer Another must-have skill is English skill I must admit that Thai people are not like people who invent the big technology that we use. They all come from English speaking countries.

Therefore the knowledge that it has most It’s not in Thai at all, but it’s been told by someone. Translated continuously, but so on. Let’s think back together. If we are people who take technology Or new toys to tell, that person has to bring what is written in English to translate for us to read, right?

So, if you want to be that person, another thing you have to have is language, it will help unlock the world a lot. It enables us to gain knowledge. Or new toys faster without waiting for anyone It’s me To be a person sharing new knowledge back in the Community

Don’t stick to the programming language
Before we said Well, we should be, and we like to say that we can write in this language. This language is not that good, that language is poor. We want to re-examine that there really is nothing better. Or worse than that, or else the language that was mentioned would have been used by no means and would die. There are pros and cons.

All we have to do is adapt and not block to learning the language. Or new toys to match the work we do better because perhaps the language we know It may not be able to do some work or not very well.

If asked us, we view language as something that is not difficult to learn. If we have a good foundation If anyone knows many languages Should understand that The first language might be difficult. And take a little time to learn But as time passed We continue to learn other languages ​​in the future, it will become easier. We will know if we are going to write a new language what we need to know.

Share what you know back to the community.
Finally, finally, when we learn new things, we are like recipients. Receive new knowledge from the Community, so on one day we develop ourselves. Getting better every day, and on the other side there will be people who just started As we just started before Therefore, when we are the recipient We also try some steps to become a giver.

It doesn’t just make it for other people. When we share what we know It caused a lot of discussion. It allows us to get things we never know from other people. It also makes our community stronger, smarter, more intimate. Causing us to learn all the time and sustainably

It’s scary to start making new things, it’s hard, now we understand. But just ask us to be patient, and gradually trial and error and be asked by people who know We are an encouragement to those who have just begun. Have fun with it

Programming It’s like riding a bicycle. When we can write We will always write. Because programming is not a language, it’s a way of thinking. Language is just a tool. That we can learn all the time and have to learn all the time. Because new languages, new concepts are born every day Have fun with programming.

What is programming?

programming means writing or creating instructions for the computer to function as needed.

With a language that the computer understands (Can be translated) in writing this program The author must understand the problem-solving process, the solution, including vocabulary and grammar. As well as the rules of the language selected, see the programming language.

programming language is a language that is structured in order to write instructions for the computer to function. The same melody as the notes of the musical language Programming languages ​​range from the lowest level Is as close to a machine language as possible to a high-level language Is close to human language (Simple English) Most see language.

A programming language one, abbreviated as PL / 1 (pronounced pl / day), is another legacy high level language. Mostly used on large computers or mainframes.

event-driven programming refers to a programming method in which to implement real events. And waits for the user to press the keyboard or mouse before taking any action.
linear programming Linear programming is a technique for calculating the best results. In solving a particular problem This type of solution is a series of optimal solutions such as mixing proportions to get the best ingredients. Most valuable And use the least cost, etc.

modular programming means that modular programming is made easy. And can be used by many people to write The advantage is more than that. Able to handle memory better too

multiprogramming (Multi-program multi-programming) means When a single computer can run two or more programs at the same time, for example, while reading a program into a archive. The processor processes the data of another program simultaneously, and the display unit may display the results obtained from the previous program (actually not running the program simultaneously). Only different units work at the same time)
structured programming refers to one method of programming. It uses a method to break into multiple subprograms or modules, making it easy to understand. The writing principle is Each section consists of three types of commands: assign orders in a sequence of operations known as sequential, directional commands called conditional, with IF-THEN-ELSE and a loop to the original. The so-called loop is a DO WHLE statement. However, if you don’t mind being a programmer, May not recognize this type of command And there is no need to know the structure program.

visual programming means programming someone who has successfully used a program. Able to customize the list of commands (menu) by themselves, may copy old items to make new collages, may choose to use Microsoft’s Visual BASIC or Borland’s Object Version.

Ranked the most in demand programming languages ​​in banking technology.

It is difficult for banking program developers to use only one programming language in their work. Today, the market demands developers with ‘full-stack’ knowledge who are skilled at A wide variety of programming languages Not only that The company’s own HR department Before submitting an applicant’s resume to the employer It will often scan for people with one or more specific skills.

For the past three years We have compiled our latest 12-month database to find out which programming languages ​​are most discussed on our website, both from employer posts and job applicants. The results show which skills are the banking technology skills that are most in demand in the market. It also reveals the skills that are most competitive in the market.

This year we will see that many banks need skilled people. And in the meantime The market itself becomes more competitive. And so it is the same thing when comparing 2017 and previous years, despite a vast increase in career diversity. But the competition for a career that requires more knowledge of programming languages ​​is no different. And that may be because at present The bank is hiring more senior software engineers and technologists from other industries than in previous years. For example, this week J.P. Morgan revealed that more than 40 percent of senior technology workers at their company were approached to a competitor bank.

The result is Many skills programs are particularly popular in the market. Here is a list of the most popular programming languages ​​in the field of finance right now.

C ++: 15.5 job applicants per total job.

It is little surprising that C ++ professionals have such a good position in the financial career. Many people might think that this is an older language than others. But it is because of the old language that this language skills are so desirable in the market today, C ++ is still the backbone of the ancient system that has been used by many banks. But the younger programmers are not very familiar with this language. C ++ still performs well in terms of high-speed exchanges as well as the need to access massive amounts of data.

Therefore, C ++ programmers know that this language skills are more useful than they thought over the years.

Python: 26 job applicants per total job

Python is a language primarily used for pricing, risk management, And the trade management platform Python has become one of the programming languages ​​for investment banks and hedge funds, replacing Java in many ways. The number of jobs requiring Python skills has almost tripled over the past half year. The number of skilled candidates moved from 270 to more than 800, and at the same time, the number of experienced applicants and the use of Python skills skyrocketed during that time.

The last time we did this research was in December 2016, Python had only 14 applicants for all jobs. But right now there were 24 applicants. At the time, it was said that the supply was not up to demand. But now the work has increased the skill level even more. Therefore, the higher percentage of Python proficiency is not a competitive matter.

Gina Schiller, managing director of Jay Gaines & Company in New York, said Python is great for building analytical tools and numerical models. It’s this unique modeling ability that has seen Python, analysts, investors and researchers find a useful tool in its own right. Python to analysts and bankers. And is part of the continuing education program This idea doesn’t come from the bank. But from the employees themselves who pay attention

Although it is difficult to find R data from a database, R is often used in conjunction with Python and appears to be very popular, although not quite as common. High frequency / low latency trading funds use the R language for statistical calculations to use predictive simulations and analysis. To quickly scan for job posts Often it must be a post that tells you Python skills alongside R.

Java: 29.8 job applicants per total job

Java is different from the Python and C ++ languages.The number of jobs required for Java has declined over the past 18 months from 460 to 346, although banks are more focused on hiring technology professionals. Bankers originally invested in Java and invested in Python and R, which are faster, more flexible and easier to use for current projects. Coming to Java, there are many people who are skilled in this language.

C #: 37.6 job applicants per total job.

Christian Glover Wilson, Tigerspike vice president of technology and strategy, said C # is yet another slightly less popular language. Due to the emerging craze in the trading landscape, C # is still used in statistical analysis and low latency tasks.One of the advantages of C # is that the market does not have many C # experts and therefore substitutes it. Not all market demand.

Matlab: 106 job applicants per total job.

Used for integral research in essence, Matlab is also overtaken by R language. Only about 150 jobs require applicants with Matlab language skills, but there are thousands of applicants with this skill.

12 popular programming languages That IT people should have to keep their skills

Programming languages ​​were no longer limited to programmers. Various careers related to network engineer, admin, storage manager, or infrastructure management You should know at least 2-3 of these popular programming languages, which will be very helpful for work.

pythonPython is the best third-party programming that anyone working with IT infrastructure should learn about. Because of its easy use Plus you don’t have to waste time compiling Debugging quickly and easily. It is applied to work quickly. Or even merging code with other code Continuously Especially used in popular SDN controllers POX and Ryu.

javaJava has long been the most popular programming language in the world. It has twice as many users as the second C language. Due to the fact that it can run on all platforms Even on Android or IoT, as well as from a very long age Making various knowledge sources There is enormous access. While it’s less easy to learn than Python, Java has a unique feature with many exceptions. That even if you write the code incorrectly, you can still understand and run. It is the primary language that every IT professional should have skills.

powershellPowerShell It is the most popular language for IT work with Windows. Allows you to retrieve information or perform tasks that cannot be performed through standard admin tools. Also at the beginning of the past year Microsoft has made PowerShell open source. That can be used on Mac and Linux as well

bash

Bash If the power shell for Windows, Bash is the main program for all Linux distributions. It is so good that Linux administrators will not be able to do a good job without knowing Bash, but now people are looking for agents that can be used on multiple platforms, like Python.

tcl

TCL (pronounced tickle tickle) is a language especially for network residents. By running on the Cisco router And many kinds of network hardware It is a useful open source for automated network management and security. It is also compatible with C language very well.

c

C is the same as Java in that it is a universal programming language. It is also the second most popular. (And had led Java several years earlier), that is, there are enormous resources to study as well It is the first language that any computer science student must know. Although not as simple as Java or Python, it can be used to control the system directly, quickly, and consumes little resources. It is the basis of all the other programming languages ​​on the planet.

c

C ++, with a C-like name, makes it third most popular in the world. But it is still as complex and difficult to learn as C ++ is a breakthrough development. With both strengths and weaknesses Compared to C

javascript

Javascript is known as a front-end web development language, but it can also be used in the background (via Node.js) or for writing automated scripts. It is considered the 6th most popular language in the world and tends to be more and more popular. It is not uncommon to be written as one of the current IT recruiting time qualifications

perl

Perl is often viewed compared to Python in terms of scripting. Which some Linux systems (Most of the old ones) can run Perl scripts. They’re also frequently used in networking and security applications. Also included is a server-side script that runs on any website (old, yet php), now it’s the 9th most popular language

php

PHP, the really popular server-side web language, is also a universal language. Most OSes in the world support It also works with SQL databases very well, it is now ranked 7th in the world, but nowadays web developers are likely to move to Ruby instead, including scripting automation, are increasingly relying on Python, but still This is a very useful language if your job involves a web server.

ruby

Ruby is a hailed language for “beautiful” and “natural”, developed to make users feel free to code. It is also easy to learn. Even though he is still only at number 13 now, he has begun to overtake it. Especially in web development work It is commonly used in conjunction with a framework called Rails.

frenetic

Frenetic is the newest freshman language. Released in 2010, it is designed primarily for software-defined networking applications. Considered to support the trend of IT infrastructure that uses software to manage. Now this is useful when you work with an OpenFlow-based SDN, but learning a new language will give you a competitive edge in the near future.

12 popular programming languages That IT people should have to keep their skills

Programming languages ​​were no longer limited to programmers. Various careers related to network engineer, admin, storage manager, or infrastructure management You should know at least 2-3 of these popular programming languages, which will be very helpful for work.

pythonPython is the best third-party programming that anyone working with IT infrastructure should learn about. Because of its easy use Plus you don’t have to waste time compiling Debugging quickly and easily. It is applied to work quickly. Or even merging code with other code Continuously Especially used in popular SDN controllers POX and Ryu.

javaJava has long been the most popular programming language in the world. It has twice as many users as the second C language. Due to the fact that it can run on all platforms Even on Android or IoT, as well as from a very long age Making various knowledge sources There is enormous access. While it’s less easy to learn than Python, Java has a unique feature with many exceptions. That even if you write the code incorrectly, you can still understand and run. It is the primary language that every IT professional should have skills.

powershellPowerShell It is the most popular language for IT work with Windows. Allows you to retrieve information or perform tasks that cannot be performed through standard admin tools. Also at the beginning of the past year Microsoft has made PowerShell open source. That can be used on Mac and Linux as well

bash

Bash If the power shell for Windows, Bash is the main program for all Linux distributions. It is so good that Linux administrators will not be able to do a good job without knowing Bash, but now people are looking for agents that can be used on multiple platforms, like Python.

tcl

TCL (pronounced tickle tickle) is a language especially for network residents. By running on the Cisco router And many kinds of network hardware It is a useful open source for automated network management and security. It is also compatible with C language very well.

c

C is the same as Java in that it is a universal programming language. It is also the second most popular. (And had led Java several years earlier), that is, there are enormous resources to study as well It is the first language that any computer science student must know. Although not as simple as Java or Python, it can be used to control the system directly, quickly, and consumes little resources. It is the basis of all the other programming languages ​​on the planet.

c

C ++, with a C-like name, makes it third most popular in the world. But it is still as complex and difficult to learn as C ++ is a breakthrough development. With both strengths and weaknesses Compared to C

javascript

Javascript is known as a front-end web development language, but it can also be used in the background (via Node.js) or for writing automated scripts. It is considered the 6th most popular language in the world and tends to be more and more popular. It is not uncommon to be written as one of the current IT recruiting time qualifications

perl

Perl is often viewed compared to Python in terms of scripting. Which some Linux systems (Most of the old ones) can run Perl scripts. They’re also frequently used in networking and security applications. Also included is a server-side script that runs on any website (old, yet php), now it’s the 9th most popular language

php

PHP, the really popular server-side web language, is also a universal language. Most OSes in the world support It also works with SQL databases very well, it is now ranked 7th in the world, but nowadays web developers are likely to move to Ruby instead, including scripting automation, are increasingly relying on Python, but still This is a very useful language if your job involves a web server.

ruby

Ruby is a hailed language for “beautiful” and “natural”, developed to make users feel free to code. It is also easy to learn. Even though he is still only at number 13 now, he has begun to overtake it. Especially in web development work It is commonly used in conjunction with a framework called Rails.

frenetic

Frenetic is the newest freshman language. Released in 2010, it is designed primarily for software-defined networking applications. Considered to support the trend of IT infrastructure that uses software to manage. Now this is useful when you work with an OpenFlow-based SDN, but learning a new language will give you a competitive edge in the near future.