There are many programming languages used in Data Science, including but not limited to:

High-level Languages

  • R – a free programming language and software environment for statistical computing. It provides a variety of statistical and graphical techniques and is highly extensible. R is popular across research domains for developing statistical software and data analyses, and producing publication quality graphics.
  • Python – a free, general-purpose programming language that emphasizes efficiency and code readability. It is object-oriented and often used for web and app development.
  • MATLAB – computing environment and proprietary programming language used most commonly in engineering, physics and economics.
  • Julia – free dynamic programming language for numerical analysis and computing.
  • Scala – genera-purpose language combining object-oriented and functional programming. It runs on the Java platform provides language interoperability with Java.
  • SAS – “Statistical Analysis System,” a proprietary software suite for data analytics. It is used by researchers in many domains to munge, mine and analyze data from a variety of sources. It is particularly popular in the health sciences.

Low-level Languages

Use-specific Languages

  • JavaScript – primarily used in visualization, dashboards, and mashups for webpages.
  • UNIX shell – interactive command language and a scripting language for controlling the executions of the operating system.
  • Regular Expressions – sequence of symbols and characters used to search for a string or pattern within a text.
  • SQL – Structured Query Language is used to communicate with a database and is the standard for many relational database management systems.
  • Pig – used with Apache Hadoop for complex data transformations.
  • XPath – XML Path Language is a syntax for defining parts of an XML document.