Technologies
There are many programming languages used in Data Science, including but not limited to:
High-level Languages
- R – a free programming language and software environment for statistical computing. It provides a variety of statistical and graphical techniques and is highly extensible. R is popular across research domains for developing statistical software and data analyses, and producing publication quality graphics.
- Python – a free, general-purpose programming language that emphasizes efficiency and code readability. It is object-oriented and often used for web and app development.
- MATLAB – computing environment and proprietary programming language used most commonly in engineering, physics and economics.
- Julia – free dynamic programming language for numerical analysis and computing.
- Scala – genera-purpose language combining object-oriented and functional programming. It runs on the Java platform provides language interoperability with Java.
- SAS – “Statistical Analysis System,” a proprietary software suite for data analytics. It is used by researchers in many domains to munge, mine and analyze data from a variety of sources. It is particularly popular in the health sciences.
Low-level Languages
Use-specific Languages
- JavaScript – primarily used in visualization, dashboards, and mashups for webpages.
- UNIX shell – interactive command language and a scripting language for controlling the executions of the operating system.
- Regular Expressions – sequence of symbols and characters used to search for a string or pattern within a text.
- SQL – Structured Query Language is used to communicate with a database and is the standard for many relational database management systems.
- Pig – used with Apache Hadoop for complex data transformations.
- XPath – XML Path Language is a syntax for defining parts of an XML document.