Sunday, October 4, 2009

Scripting language

A scripting language, script language or extension language is a programming language that allows control of one or more software applications. "Scripts" are distinct from the core code of the application, which is usually written in a different language, and are often created or at least modified by the end-user.[1] Scripts are often interpreted from source code or bytecode, whereas the applications they control are traditionally compiled to native machine code. Scripting languages are nearly always embedded in the applications they control.[2]
The name "script" is derived from the written script of the performing arts, in which dialogue is set down to be spoken by human actors. Early script languages were often called batch languages or job control languages. Such early scripting languages were created to shorten the traditional edit-compile-link-run process.
An example many people have used is a web browser like Firefox. Firefox is written in C/C++ and can be controlled by JavaScript.

Historical overview

The first interactive shells were developed in the 1960s to enable remote operation of the first time-sharing systems, and these generated a demand for scripting, to relieve the human operator of the tedium of re-entering sequences of commands at a computer terminal keyboard, so from there were developed simple macro commands, files containing sequences of commands, which eventually developed into shell scripts. In a parallel development, the larger and more complex applications developed embedded scripting facilities, at first very rudimentary, to facilitate batch mode operation where a human operator would not be present to guide the program. Thus part of the program was devoted to interpreting instructions written by the user in a (usually quite specialized) instruction language — a computer program within a computer program.
Historically, there was a clear distinction between "real" high speed programs written in languages such as C, and simple, slow scripts written in languages such as Bourne Shell or Awk. But as technology improved, the performance differences shrank and interpreted languages like Java, Lisp, Perl and Python emerged and gained in popularity to the point where they are considered general-purpose programming languages and not just languages that "drive" an interpreter.
Languages such as Tcl and Lua, were specifically designed as general purpose scripting languages that could be embedded in any application or used on their own. Other systems such as Visual Basic for Applications (VBA) provided strong integration with the automation facilities of an underlying system. Embedding of such general purpose scripting languages instead of developing a new language for each application also had obvious benefits, relieving the application developer of the need to code a language translator from scratch and allowing the user to apply skills learned elsewhere.
The Common Gateway Interface allowed scripting languages to control web servers, thus communicate over the web. Scripting languages that made use of CGI early in the evolution of the Web include Perl, ASP, and PHP.
Some software incorporates several different scripting languages. Modern web browsers typically provide a language for writing extensions to the browser itself, and several standard embedded languages for controlling the browser, including ECMAScript (more commonly known as JavaScript), CSS, and HTML.

Types of scripting languages

Job control languages and shells

A major class of scripting languages has grown out of the automation of job control, which relates to starting and controlling the behavior of system programs. (In this sense, one might think of shells as being descendants of IBM's JCL, or Job Control Language, which was used for exactly this purpose.) Many of these languages' interpreters double as command-line interpreters such as the Unix shell or the MS-DOS COMMAND.COM. Others, such as AppleScript offer the use of English-like commands to build scripts. This combined with Mac OS X's Cocoa frameworks allows user to build entire applications using AppleScript & Cocoa objects.

GUI Scripting

With the advent of graphical user interfaces came a specialized kind of scripting language for controlling a computer. These languages interact with the same graphic windows, menus, buttons, and so on that a system generates. They do this by simulating the actions of a human user. These languages are typically used to automate user actions or configure a standard state. Such languages are also called "macros" when control is through simulated key presses or mouse clicks.
These languages could In principle be used to control any application running on a GUI-based computer; but, in practice, the support for such languages typically depends on the application and operating system. There are a few exceptions to this limitation. Some GUI scripting languages are based on recognizing graphical objects from their display screen pixels. These GUI scripting languages do not depend on support from the operating system, or application.

Application-specific languages

Many large application programs include an idiomatic scripting language tailored to the needs of the application user. Likewise, many computer game systems use a custom scripting language to express the programmed actions of non-player characters and the game environment. Languages of this sort are designed for a single application; and, while they may superficially resemble a specific general-purpose language (e.g. QuakeC, modeled after C), they have custom features that distinguish them. Emacs Lisp, while a fully formed and capable dialect of Lisp, contains many special features that make it most useful for extending the editing functions of Emacs. An application-specific scripting language can be viewed as a domain-specific programming language specialized to a single application.

Web browsers

Web browsers are applications for displaying web pages. A host of special-purpose languages has developed to control their operation. These include JavaScript, a scripting language superficially resembling Java; VBScript by Microsoft, which only works in Internet Explorer; XUL by the Mozilla project, which only works in Firefox; and XSLT, a presentation language that transforms XML content into a new form. Techniques involving the combination of XML and JavaScript scripting to improve the user's impression of responsiveness have become significant enough to acquire a name: AJAX.

Web servers

On the server side of the HTTP link, application servers and other dynamic content servers such as Web content management systems provide content through a large variety of techniques and technologies typified by the scripting approach. Particularly prominent in this area are PHP, JSP and ASP, but other developments such as Ruby on Rails have carved out a niche.

Text processing languages

The processing of text-based records is one of the oldest uses of scripting languages. Scripts written for the Unix tools AWK, sed, and grep automate tasks that involve text-based configuration and log files. Of high importance here is the regular expression, a language developed for the formal description of the lexical structure of text, and used by all of these tools.
Perl was originally designed to overcome limitations of these tools and has grown to be one of the most widespread general purpose languages.

General-purpose dynamic languages

Some languages, such as Perl, began as scripting languages but were developed into programming languages suitable for broader purposes. Other similar languages – frequently interpreted, memory-managed, or dynamic – have been described as "scripting languages" for these similarities, even if they are more commonly used for applications programming. They are usually not called "scripting languages" by their own users.

Extension/embeddable languages

A number of languages have been designed for the purpose of replacing application-specific scripting languages by being embeddable in application programs. The application programmer (working in C or another systems language) includes "hooks" where the scripting language can control the application. These languages serve the same purpose as application-specific extension languages but with the advantage of allowing some transfer of skills from application to application. JavaScript began as and primarily still is a language for scripting inside web browsers; however, the standardization of the language as ECMAScript has made it popular as a general purpose embeddable language. In particular, the Mozilla implementation SpiderMonkey is embedded in several environments such as the Yahoo! Widget Engine. Other applications embedding ECMAScript implementations include the Adobe products Adobe Flash (ActionScript) and Adobe Acrobat (for scripting PDF files).
Tcl was created as an extension language but has come to be used more frequently as a general purpose language in roles similar to Python, Perl, and Ruby.
Other complex and task-oriented applications may incorporate and expose an embedded programming language to allow their users more control and give them more functionality than can be available through a user interface, no matter how sophisticated. For example, Autodesk Maya 3D authoring tools embed the MEL scripting language, or Blender which has Python to fill this role.
Some other types of applications that need faster feature addition or tweak-and-run cycles (e.g. game engines) also use an embedded language. During the development, this allows them to prototype features faster and tweak more freely, without the need for the user to have intimate knowledge of the inner workings of the application or to rebuild it after each tweak (which can take a significant amount of time.) The scripting languages used for this purpose range from the more common and more famous Lua and Python to lesser-known ones such as AngelScript and Squirrel.

Market analysis

The most popular scripting language, as of 2008, is JavaScript. The second most popular is PHP. Perl is the third most popular scripting language, but in North America it enjoys significantly more popularity.

Syntax (programming languages)

In computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be syntactically correct programs in that language. The syntax of a language defines its surface form.[1] Text-based programming languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical).
The lexical grammar of a textual language specifies how characters must be chunked into tokens. Other syntax rules specify the permissible sequences of these tokens and the process of assigning meaning to these token sequences is part of semantics.
The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax trees are one convenient form of syntax tree). This process is called parsing, as it is in syntactic analysis in linguistics. Tools have been written that automatically generate parsers from a specification of a language grammar written in Backus-Naur form, e.g., Yacc (yet another compiler compiler).

Syntax highlighting is often used to aid programmers in recognizing elements of source code. The language above is Python.
 

Syntax definition

Parse tree of Python code with inset tokenization
The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) and terminal symbols. Syntactic categories are defined by rules called productions, which specify the values that belong to a particular syntactic category.[1] Terminal symbols are the concrete characters or strings of characters (for example keywords such as define, if, let, or void) from which syntactically valid programs are constructed.
Below is a simple grammar, based on Lisp, which defines productions for the syntactic categories expression, atom, number, symbol, and list:
expression ::= atom | list
atom  ::= number | symbol
number  ::= [+-]?['0'-'9']+
symbol  ::= ['A'-'Z''a'-'z'].*
list  ::= '(' expression* ')'

This grammar specifies the following:
  • an expression is either an atom or a list;
  • an atom is either a number or a symbol;
  • a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign;
  • a symbol is a letter followed by zero or more of any characters (excluding whitespace); and
  • a list is a matched pair of parentheses, with zero or more expressions inside it.
Here the decimal digits, upper- and lower-case characters, and parentheses are terminal symbols.
The following are examples of well-formed token sequences in this grammar: '12345', '()', '(a b c232 (1))'
The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammars.[2] However, there are exceptions. In some languages like Perl and Lisp the specification (or implementation) of the language allows constructs that execute during the parsing phase. Furthermore, these languages have constructs that allow the programmer to alter the behavior of the parser. This combination effectively blurs the distinction between parsing and execution, and makes syntax analysis an undecidable problem in these languages, meaning that the parsing phase may not finish. For example, in Perl it is possible to execute code during parsing using a BEGIN statement, and Perl function prototypes may alter the syntactic interpretation, and possibly even the syntactic validity of the remaining code.[3] Similarly, Lisp macros introduced by the defmacro syntax also execute during parsing, meaning that a Lisp compiler must have an entire Lisp run-time system present. In contrast C macros are merely string replacements, and do not require code execution.[4][5]

Syntax versus semantics

The syntax of a language describes the form of a valid program, but does not provide any information about the meaning of the program or the results of executing that program. The meaning given to a combination of symbols is handled by semantics (either formal or hard-coded in a reference implementation). Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit undefined behavior. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it.
Using natural language as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false:
  • "Colorless green ideas sleep furiously." is grammatically well-formed but has no generally accepted meaning.
  • "John is a married bachelor." is grammatically well-formed but expresses a meaning that cannot be true.
The following C language fragment is syntactically correct, but performs an operation that is not semantically defined (because p is a null pointer, the operations p->real and p->im have no meaning):
complex *p = NULL;
complex abs_p = sqrt (p->real * p->real + p->im * p->im);
 
 

Weak typing

In computer science, weak typing (a.k.a. loose typing) is a property attributed to the type systems of some programming languages. It is the opposite of strong typing, and consequently the term weak typing has as many different meanings as strong typing does (see strong typing for a list and detailed discussion).
One of the more common definitions states that weakly typed programming languages are those that support either implicit type conversion (nearly all languages support at least one implicit type conversion), ad-hoc polymorphism (also known as overloading) or both. These less restrictive usage rules can give the impression that strict adherence to typing rules is less important than in strongly typed languages and hence that the type system is "weaker". However, such languages usually have restrictions on what programmers can do with values of a given type; thus it is possible for a weakly typed language to be type safe. Moreover, weakly typed languages may be statically typed, in which case overloading is resolved statically and type conversion operations are inserted by the compiler, or dynamically typed, in which case everything is resolved at run time.
The advantage claimed of weak typing is that it requires less effort on the part of the programmer than strong typing, because the compiler or interpreter implicitly performs certain kinds of conversions. However, one claimed disadvantage is that weakly typed programming systems catch fewer errors at compile time and some of these might still remain after testing has been completed. Two commonly used languages that support many kinds of implicit conversion are C and C++, and it is sometimes claimed that these are weakly typed languages. However, others argue that these languages place enough restrictions on how operands of different types can be mixed, that the two should be regarded as strongly typed languages.
C++ places more restrictions on the handling of enumerated types than C:
void somefunc(void*);
 
void caller(void){
  int bob[5];
  int **billy = &bob;
  somefunc(bob); /*Error in C++*/
  somefunc(billy); /*Valid C, but produces incompatible-pointer warning*/
  somefunc((void*)billy); /*Valid C and C++, warning-free*/
}

Strongly typed programming language

In computer science and computer programming, the term strong typing is used to describe those situations where programming languages specify one or more restrictions on how operations involving values having different data types can be intermixed. Its antonym is weak typing. However, these terms have been given such a wide variety of meanings over the short history of computing that it is often difficult to know, out of context, what an individual author means when using them.

Interpretation

Most generally, "strong typing" implies that the programming language places severe restrictions on the intermixing that is permitted to occur, preventing the compiling or running of source code which uses data in what is considered to be an invalid way. For instance, an integer division operation may not be used upon strings; a procedure which operates upon linked lists may not be used upon numbers. However, the nature and strength of these restrictions is highly variable.
Benjamin C. Pierce, author of Types and Programming Languages and Advanced Types and Programming Languages, says, "I spent a few weeks... trying to sort out the terminology of "strongly typed," "statically typed," "safe," etc., and found it amazingly difficult.... The usage of these terms is so various as to render them almost useless." [1] Luca Cardelli's article Typeful Programming describes strong typing simply as the absence of unchecked run-time type errors.[2] In other writing, the absence of unchecked run-time errors is referred to as safety or type safety; Tony Hoare's early papers call this property security.

Meanings in computer literature

Some of the factors which writers have qualified as "strong typing" include:
  • Strong guarantees about the run-time behavior of a program before program execution, whether provided by static analysis, the execution semantics of the language or another mechanism.
  • Type safety; that is, at compile or run time, the rejection of operations or function calls which attempt to disregard data types. In a more rigorous setting, type safety is proved about a formal language by proving progress and preservation.
  • The guarantee that a well-defined error or exceptional behavior (as opposed to an undefined behavior) occurs as soon as a type-matching failure happens at runtime, or, as a special case of that with even stronger constraints, the guarantee that type-matching failures would never happen at runtime (which would also satisfy the constraint of "no undefined behavior" after type-matching failures, since the latter would never happen anyway).
  • The mandatory requirement, by a language definition, of compile-time checks for type constraint violations. That is, the compiler ensures that operations only occur on operand types that are valid for the operation.
  • Fixed and invariable typing of data objects. The type of a given data object does not vary over that object's lifetime. For example, class instances may not have their class altered.
  • The absence of ways to evade the type system. Such evasions are possible in languages that allow programmer access to the underlying representation of values, i.e., their bit-pattern.
  • Omission of implicit type conversion, that is, conversions that are inserted by the compiler on the programmer's behalf. For these authors, a programming language is strongly typed if type conversions are allowed only when an explicit notation, often called a cast, is used to indicate the desire of converting one type to another.
  • Disallowing any kind of type conversion. Values of one type cannot be converted to another type, explicitly or implicitly.
  • A complex, fine-grained type system with compound types.

Variation across programming languages

Note that some of these definitions are contradictory, others are merely orthogonal, and still others are special cases (with additional constraints) of other, more "liberal" (less strong) definitions. Because of the wide divergence among these definitions, it is possible to defend claims about most programming languages that they are either strongly or weakly typed. For instance:
  • Java, Pascal and C require all variables to have a defined type and support the use of explicit casts of arithmetic values to other arithmetic types. Java and Pascal are often said to be more strongly typed than C, a claim that is probably based on the fact that C supports more kinds of implicit conversions than Pascal, and C also allows pointer values to be explicitly cast while Java and Pascal do not. Java itself may be considered more strongly typed than Pascal as manners of evading the static type system in Java are controlled by the Java Virtual Machine's dynamic type system.
  • Smalltalk, Ruby, Python, Self, and the LISP family of languages are all "strongly typed" in the sense that typing errors are prevented at runtime, but these languages make no use of static type checking: the compiler does not check or enforce type constraint rules. The term duck typing is now used to describe the dynamic typing paradigm used by the languages in this group.
  • Standard ML, OCaml and Haskell have purely static type systems, in which the compiler automatically infers a precise type for all values. These languages (along with most functional languages) are considered to have stronger type systems than Java, as they permit no implicit type conversions. While OCaml's libraries allow one form of evasion (Object magic), this feature remains unused in most applications.
  • Visual BASIC is a hybrid language. In addition to including statically typed variables, it includes a "Variant" data type that can store data of any type. Its implicit casts are fairly liberal where, for example, one can sum string variants and pass the result into an integer literal.
  • Assembly language and Forth have been said to be untyped. There is no type checking; it is up to the programmer to ensure that data given to functions is of the appropriate type. Any type conversion required is explicit.
  • Ada is statically and strongly typed.
For this reason, writers who wish to write unambiguously about type systems often eschew the term "strong typing" in favor of specific expressions such as "static typing" or "type safety".

Type system

In computer science, a type system may be defined as "a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute."[1]. Loosely, a type system associates one (or more) type(s) with each program value; by examining the relation between types and expressions, a type system attempts to prove that no "type errors" can occur. The type system in question determines what constitutes a "type error", but type systems in common use generally seek to guarantee that operations expecting a certain kind of value are not used with values for which that operation makes no sense.
A compiler may use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. In many C compilers the "float" data type, for example, is represented in 32 bits, in accordance with the IEEE specification for single-precision floating point numbers. C thus uses floating-point-specific operations on those values (floating-point addition, multiplication, etc.).
The depth of type constraints and the manner of their evaluation affect the typing of the language. A programming language may further associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory is the study of type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation, and language design.

Fundamentals

Assigning data types (typing) gives meaning to collections of bits. Types usually have associations either with values in memory or with objects such as variables. Because any value simply consists of a sequence of bits in a computer, hardware makes no distinction even between memory addresses, instruction code, characters, integers and floating-point numbers. Assignment to a type informs programs and programmers how those bit collections should be treated.
Major functions provided by type systems include:
  • Safety - Use of types may allow a compiler to detect meaningless or probably invalid code. For example, we can identify an expression 3 / "Hello, World" as invalid because the rules of arithmetic do not specify how to divide an integer by a string. As discussed below, strong typing offers more safety, but generally does not guarantee complete safety (see type-safety for more information).
  • Optimization - Static type-checking may provide useful compile-time information. For example, if a type requires that a value must align in memory at a multiple of 4 bytes, the compiler may be able to use more efficient machine instructions.
  • Documentation - In more expressive type systems, types can serve as a form of documentation, since they can illustrate the intent of the programmer. For instance, timestamps may be represented as integers—but if a programmer declares a function as returning a timestamp type rather than merely an integer type, this documents part of the meaning of the function.
  • Abstraction (or modularity) - Types allow programmers to think about programs at a higher level than the bit or byte, not bothering with low-level implementation. For example, programmers can think of a string as a collection of character values instead of as a mere array of bytes. Or, types can allow programmers to express the interface between two subsystems. This helps localize the definitions required for interoperability of the subsystems and prevents inconsistencies when those subsystems communicate.
Type safety contributes to program correctness, but cannot guarantee it lest the type checking itself becomes an undecidable problem. Depending on the specific type system, a program may give the wrong result and be safely typed, producing no compiler errors. For instance, division by zero is not caught by the type checker in most programming languages; instead it is a runtime error. To prove the absence of more general defects, other kinds of formal methods, collectively known as program analyses, are in common use, as well as software testing—a widely used empirical method for finding errors that the type checker cannot detect.
A program typically associates each value with one particular type (although a type may have more than one subtype). Other entities, such as objects, modules, communication channels, dependencies, or even types themselves, can become associated with a type. Some implementations might make the following identifications (though these are technically different concepts):
A type system, specified in for each programming language, controls the ways typed programs may behave, and makes behavior outside these rules illegal. An effect system typically provides more fine-grained control than does a type system.
Formally, type theory studies type systems. More elaborate type systems (such as dependent types) allow for more correct programs to be well-typed, but this comes at a price, as type inference and other properties generally become undecidable, and type checking itself is dependent on user-supplied proofs. It is challenging to find a sufficiently expressive type system that satisfies all programming practices in type safe manner. As Mark Manasse concisely put it:[2]
The fundamental problem addressed by a type theory is to insure that programs have meaning. The fundamental problem caused by a type theory is that meaningful programs may not have meanings ascribed to them. The quest for richer type systems results from this tension.

Type checking

The process of verifying and enforcing the constraints of types – type checking – may occur either at compile-time (a static check) or run-time (a dynamic check). If a language specification requires its typing rules strongly (ie, more or less allowing only those automatic type conversions which do not lose information), one can refer to the process as strongly typed, if not, as weakly typed. The terms are not used in a strict sense.

Static typing

A programming language is said to use static typing when type checking is performed during compile-time as opposed to run-time. In static typing, types are associated with variables not values. Statically typed languages include Ada, C, C++, C#, JADE, Java, Fortran, Haskell, ML, Pascal, Perl (with respect to distinguishing scalars, arrays, hashes and subroutines) and Scala. Static typing is a limited form of program verification (see type safety): accordingly, it allows many type errors to be caught early in the development cycle. Static type checkers evaluate only the type information that can be determined at compile time, but are able to verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient (i.e. faster or taking reduced memory) by omitting runtime type checks and enabling other optimizations.
Because they evaluate type information during compilation, and therefore lack type information that is only available at run-time, static type checkers are conservative. They will reject some programs that may be well-behaved at run-time, but that cannot be statically determined to be well-typed. For example, even if an expression <complex test> always evaluates to true at run-time, a program containing the code
if <complex test> then 42 else <type error>
will be rejected as ill-typed, because a static analysis cannot determine that the else branch won't be taken.[1] The conservative behaviour of static type checkers is advantageous when <complex test> evaluates to false infrequently: A static type checker can detect type errors in rarely used code paths. Without static type checking, even code coverage tests with 100% code coverage may be unable to find such type errors. Code coverage tests may fail to detect such type errors because the combination of all places where values are created and all places where a certain value is used must be taken into account.
The most widely used statically typed languages are not formally type safe. They have "loopholes" in the programming language specification enabling programmers to write code that circumvents the verification performed by a static type checker and so address a wider range of problems. For example, Java and most C-style languages have type punning, and Haskell has such features as unsafePerformIO: such operations may be unsafe at runtime, in that they can cause unwanted behaviour due to incorrect typing of values when the program runs.

Dynamic typing

A programming language is said to be dynamically typed, or just 'dynamic', when the majority of its type checking is performed at run-time as opposed to at compile-time. In dynamic typing, types are associated with values not variables. Dynamically typed languages include Groovy, JavaScript, Lisp, Lua, Objective-C, Perl (with respect to user-defined types but not built-in types), PHP, Prolog, Python, Ruby, Smalltalk and Tcl. Compared to static typing, dynamic typing can be more flexible (e.g. by allowing programs to generate types and functionality based on run-time data), though at the expense of fewer a priori guarantees. This is because a dynamically typed language accepts and attempts to execute some programs which may be ruled as invalid by a static type checker.
Dynamic typing may result in runtime type errors—that is, at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. This operation may occur long after the place where the programming mistake was made—that is, the place where the wrong type of data passed into a place it should not have. This makes the bug difficult to locate.
Dynamically typed language systems, compared to their statically typed cousins, make fewer "compile-time" checks on the source code (but will check, for example, that the program is syntactically correct). Run-time checks can potentially be more sophisticated, since they can use dynamic information as well as any information that was present during compilation. On the other hand, runtime checks only assert that conditions hold in a particular execution of the program, and these checks are repeated for every execution of the program.
Development in dynamically typed languages is often supported by programming practices such as unit testing. Testing is a key practice in professional software development, and is particularly important in dynamically typed languages. In practice, the testing done to ensure correct program operation can detect a much wider range of errors than static type-checking, but conversely cannot search as comprehensively for the errors that both testing and static type checking are able to detect. Testing can be incorporated into the software build cycle, in which case it can be thought of as a "compile-time" check, in that the program user will not have to manually run such tests.

Combinations of dynamic and static typing

The presence of static typing in a programming language does not necessarily imply the absence of all dynamic typing mechanisms. For example, Java, and various other object-oriented languages, while using static typing, require for certain operations (downcasting) the support of runtime type tests, a form of dynamic typing. See programming language for more discussion of the interactions between static and dynamic typing.

Static and dynamic type checking in practice

The choice between static and dynamic typing requires trade-offs.
Static typing can find type errors reliably at compile time. This should increase the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, and thus what proportion of those bugs which are written would be caught by static typing. Static typing advocates believe programs are more reliable when they have been well type-checked, while dynamic typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static typing, then, presumably increases as the strength of the type system is increased. Advocates of dependently typed languages such as Dependent ML and Epigram have suggested that almost all bugs can be considered type errors, if the types used in a program are properly declared by the programmer or correctly inferred by the compiler.[3]
Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce optimized machine code. Further, compilers for statically typed languages can find assembler shortcuts more easily. Some dynamically typed languages such as Common Lisp allow optional type declarations for optimization for this very reason. Static typing makes this pervasive. See optimization.
By contrast, dynamic typing may allow compilers to run more quickly and allow interpreters to dynamically load new code, since changes to source code in dynamically typed languages may result in less checking to perform and less code to revisit. This too may reduce the edit-compile-test-debug cycle.
Statically typed languages which lack type inference (such as Java and C) require that programmers declare the types they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not permit the programmer to ignore or permit to drift out of synchronization. However, a language can be statically typed without requiring type declarations (examples include Haskell, Scala and C#3.0), so this is not a necessary consequence of static typing.
Dynamic typing allows constructs that some static type checking would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible (however, the typing within that evaluated code might remain static). Furthermore, dynamic typing better accommodates transitional code and prototyping, such as allowing a placeholder data structure (mock object) to be transparently used in place of a full-fledged data structure (usually for the purposes of experimentation and testing). Recent enhancements to statically typed languages (e.g. Haskell Generalized algebraic data types) have allowed eval functions to be written in a statically type checked way.[4]
Dynamic typing typically makes metaprogramming more effective and easier to use. For example, C++ templates are typically more cumbersome to write than the equivalent Ruby or Python code.[citation needed] More advanced run-time constructs such as metaclasses and introspection are often more difficult to use in statically typed languages.
Static typing expands the possibilities for programmatic refactoring. For example, in a statically typed language, analysis of the source code can reveal all callers of a method, and hence a tool can consistently rename the method throughout all of its uses. In dynamic languages, this kind of analysis is either impossible or more difficult because the reference of a name (e.g. a method name) cannot surely be determined until runtime.

Strong and weak typing

One definition of strongly typed involves preventing success for an operation on arguments which have the wrong type. A C cast gone wrong exemplifies the problem of absent strong typing; if a programmer casts a value from one type to another in C, not only must the compiler allow the code at compile time, but the runtime must allow it as well. This may permit more compact and faster C code, but it can make debugging more difficult.
Some observers use the term memory-safe language (or just safe language) to describe languages that do not allow undefined operations to occur. For example, a memory-safe language will check array bounds, or else statically guarantee (i.e., at compile time before execution) that array accesses out of the array boundaries will cause compile-time and perhaps runtime errors.
Weak typing means that a language implicitly converts (or casts) types when used. Revisiting the previous example, we have:
var x := 5;    // (1)  (x is an integer)
var y := "37"; // (2)  (y is a string)
x + y;         // (3)  (?)

In a weakly typed language, the result of this operation is not clear. Some languages, such as Visual Basic, would produce runnable code producing the result 42: the system would convert the string "37" into the number 37 to forcibly make sense of the operation. Other languages like JavaScript would produce the result "537": the system would convert the number 5 to the string "5" and then concatenate the two. In both Visual Basic and JavaScript, the resulting type is determined by rules that take both operands into consideration. In some languages, such as AppleScript, the type of the resulting value is determined by the type of the left-most operand only.

Safely and unsafely typed systems

A third way of categorizing the type system of a programming language uses the safety of typed operations and conversions. Computer scientists consider a language "type-safe" if it does not allow operations or conversions which lead to erroneous conditions.

var x := 5;     // (1)
var y := "37";  // (2)
var z := x + y; // (3)
In languages like Visual Basic variable z in the example acquires the value 42. While the programmer may or may not have intended this, the language defines the result specifically, and the program does not crash or assign an ill-defined value to z. In this respect, such languages are type-safe; however, if the value of y was a string that could not be converted to a number (eg "hello world"), the results would be undefined. Such languages are type-safe (in that they will not crash) but can easily produce undesirable results.
Now let us look at the same example in C:
int x = 5;
char y[] = "37";
char* z = x + y;
In this example z will point to a memory address five characters beyond y, equivalent to three characters after the terminating zero character of the string pointed to by y. The content of that location is undefined, and might lie outside addressable memory. The mere computation of such a pointer may result in undefined behavior (including the program crashing) according to C standards, and in typical systems dereferencing z at this point could cause the program to crash. We have a well-typed, but not memory-safe program — a condition that cannot occur in a type-safe language.

Polymorphism and types

The term "polymorphism" refers to the ability of code (in particular, methods or classes) to act on values of multiple types, or to the ability of different instances of the same data-structure to contain elements of different types. Type systems that allow polymorphism generally do so in order to improve the potential for code re-use: in a language with polymorphism, programmers need only implement a data structure such as a list or an associative array once, rather than once for each type of element with which they plan to use it. For this reason computer scientists sometimes call the use of certain forms of polymorphism generic programming. The type-theoretic foundations of polymorphism are closely related to those of abstraction, modularity and (in some cases) subtyping.

Duck typing

In "duck typing," a statement calling a method m on an object does not rely on the declared type of the object; only that the object, of whatever type, must implement the method called. One way of looking at this is that in "duck" typing systems the type of an object is intrinsic to the object and is determined by what methods it implements, and hence that a "duck" typing system is by definition type-safe since one can only invoke operations an object actually implements. Another way of looking at this is that the object is a member of several types, including a type that describes the fact that it "has a method m." Type checking however occurs only on demand at runtime, every time the method m needs to be executed, not at compile-time or load-time.
Duck typing differs from structural typing in that, if the part (of the whole module structure) needed for a given local computation is present at runtime, the duck type system is satisfied in its type identity analysis. On the other hand, a structural type system would require the analysis of the whole module structure at compile-time to determine type identity or type dependence.
Duck typing differs from a nominative type system in a number of aspects. The most prominent ones are that, for duck typing, type information is determined at runtime (as contrasted to compile-time) and the name of the type is irrelevant to determine type identity or type dependence; only partial structure information is required for that, for a given point in the program execution.
Initially coined by Alex Martelli in the Python community, duck typing uses the premise that (referring to a value) "if it walks like a duck, and quacks like a duck, then it is a duck".

Specialized type systems

Many type systems have been created that are specialized for use in certain environments, with certain types of data, or for out-of-band static program analysis. Frequently these are based on ideas from formal type theory and are only available as part of prototype research systems.

Dependent types

Dependent types are based on the idea of using scalars or values to more precisely describe the type of some other value. For example, "matrix(3,3)" might be the type of a 3×3 matrix. We can then define typing rules such as the following rule for matrix multiplication:
matrix_multiply : matrix(k,m) × matrix(m,n) → matrix(k,n)
where k, m, n are arbitrary positive integer values. A variant of ML called Dependent ML has been created based on this type system, but because type-checking conventional dependent types is undecidable, not all programs using them can be type-checked without some kind of limitations. Dependent ML limits the sort of equality it can decide to Presburger arithmetic; other languages such as Epigram make the value of all expressions in the language decidable so that type checking can be decidable, it is also possible to make the language Turing complete at the price of undecidable type checking like in Cayenne .

Linear types

Linear types, based on the theory of linear logic, and closely related to uniqueness types, are types assigned to values having the property that they have one and only one reference to them at all times. These are valuable for describing large immutable values such as strings, files, and so on, because any operation that simultaneously destroys a linear object and creates a similar object (such as 'str = str + "a"') can be optimized "under the hood" into an in-place mutation. Normally this is not possible because such mutations could cause side effects on parts of the program holding other references to the object, violating referential transparency. They are also used in the prototype operating system Singularity for interprocess communication, statically ensuring that processes cannot share objects in shared memory in order to prevent race conditions. The Clean language (a Haskell-like language) uses this type system in order to gain a lot of speed while remaining safe.

Intersection types

Intersection types are types describing values that belong to both of two other given types with overlapping value sets. For example, in most implementations of C the signed char has range -128 to 127 and the unsigned char has range 0 to 255, so the intersection type of these two types would have range 0 to 127. Such an intersection type could be safely passed into functions expecting either signed or unsigned chars, because it is compatible with both types.
Intersection types are useful for describing overloaded function types: For example, if "int → int" is the type of functions taking an integer argument and returning an integer, and "float → float" is the type of functions taking a float argument and returning a float, then the intersection of these two types can be used to describe functions that do one or the other, based on what type of input they are given. Such a function could be passed into another function expecting an "int → int" function safely; it simply would not use the "float → float" functionality.
In a subclassing hierarchy, the intersection of a type and an ancestor type (such as its parent) is the most derived type. The intersection of sibling types is empty.
The Forsythe language includes a general implementation of intersection types. A restricted form is refinement types.

Union types

Union types are types describing values that belong to either of two types. For example, in C, the signed char has range -128 to 127, and the unsigned char has range 0 to 255, so the union of these two types would have range -128 to 255. Any function handling this union type would have to deal with integers in this complete range. More generally, the only valid operations on a union type are operations that are valid on both types being unioned. C's "union" concept is similar to union types, but is not typesafe because it permits operations that are valid on either type, rather than both. Union types are important in program analysis, where they are used to represent symbolic values whose exact nature (eg, value or type) is not known.
In a subclassing hierarchy, the union of a type and an ancestor type (such as its parent) is the ancestor type. The union of sibling types is a subtype of their common ancestor (that is, all operations permitted on their common ancestor are permitted on the union type, but they may also have other valid operations in common).

Existential types

Existential types are frequently used to represent modules and abstract data types because of their ability to separate implementation from interface. For example, in C pseudocode, the type "T = ∃X { X a; int f(X); }" describes a module interface that has a data member of type X and a function that takes a parameter of the same type X and returns an integer. This could be implemented in different ways; for example:
  • intT = { int a; int f(int); }
  • floatT = { float a; int f(float); }
These types are both subtypes of the more general existential type T and correspond to concrete implementation types, so any value of one of these types is a value of type T. Given a value "t" of type "T", we know that "t.f(t.a)" is well-typed, regardless of what the abstract type X is. This gives flexibility for choosing types suited to a particular implementation while clients that use only values of the interface type — the existential type — are isolated from these choices.
In general it's impossible for the typechecker to infer which existential type a given module belongs to. In the above example intT { int a; int f(int); } could also have the type ∃X { X a; int f(int); }. The simplest solution is to annotate every module with its intended type, e.g. (we depart from C-style syntax of writing the type first to make the example clear)
  • intT = { int a; int f(int); } as ∃X { X a; int f(X); }
Although abstract data types and modules had been implemented in programming languages for quite some time, it wasn't unitl 1988 that John C. Mitchell and Gordon Plotkin established the formal theory under the slogan: "Abstract [data] types have existential type".[5] The theory is a second-order typed lambda calculus similar to System F, but with existential instead of universal quantification. Existential types are implemented in Haskell98.

Explicit or implicit declaration and inference

Many static type systems, such as those of C and Java, require type declarations: The programmer must explicitly associate each variable with a particular type. Others, such as Haskell's, perform type inference: The compiler draws conclusions about the types of variables based on how programmers use those variables. For example, given a function f(x,y) which adds x and y together, the compiler can infer that x and y must be numbers – since addition is only defined for numbers. Therefore, any call to f elsewhere in the program that specifies a non-numeric type (such as a string or list) as an argument would signal an error.
Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14 might imply a type of floating-point, while [1, 2, 3] might imply a list of integers – typically an array.
Type inference is in general possible if it is decidable in the type theory in question. Moreover, even if inference is undecidable in general for a given type theory, inference is often possible for a large subset of real-world programs. Haskell's type system, a version of Hindley-Milner, is a restriction of System Fω to so-called rank-1 polymorphic types, in which type inference is decidable. Most Haskell compilers allow arbitrary-rank polymorphism as an extension, but this makes type inference undecidable. (Type checking is decidable, however, and rank-1 programs still have type inference; higher rank polymorphic programs are rejected unless given explicit type annotations.)

Types of types

A type of types is a kind. Kinds appear explicitly in typeful programming, such as a type constructor in the Haskell programming language.
Types fall into several broad categories:

Compatibility: equivalence and subtyping

A type-checker for a statically typed language must verify that the type of any expression is consistent with the type expected by the context in which that expression appears. For instance, in an assignment statement of the form x := e, the inferred type of the expression e must be consistent with the declared or inferred type of the variable x. This notion of consistency, called compatibility, is specific to each programming language.
If the type of e and the type of x are the same and assignment is allowed for that type, then this is a valid expression. In the simplest type systems, therefore, the question of whether two types are compatible reduces to that of whether they are equal (or equivalent). Different languages, however, have different criteria for when two type expressions are understood to denote the same type. These different equational theories of types vary widely, two extreme cases being structural type systems, in which any two types are equivalent that describe values with the same structure, and nominative type systems, in which no two syntactically distinct type expressions denote the same type (i.e., types must have the same "name" in order to be equal).
In languages with subtyping, the compatibility relation is more complex. In particular, if A is a subtype of B, then a value of type A can be used in a context where one of type B is expected, even if the reverse is not true. Like equivalence, the subtype relation is defined differently for each programming language, with many variations possible. The presence of parametric or ad hoc polymorphism in a language may also have implications for type compatibility.

Controversy

There are often conflicts between those who prefer statically typed languages and those who prefer dynamically typed languages. The first group advocates for the early detection of type errors during compilation and increased runtime performance, while the latter group advocates for rapid prototyping that is possible with a more dynamic typing system and that type errors are only a small subset of errors in a program.[6][7] Related to this is the consideration that often there is no need to manually declare all types in statically typed programming languages with type inference; thus, the need for the programmer to explicitly specify types of variables is automatically lowered for such languages; and some dynamic languages have run-time optimisers[8][9] that can generate fast code approaching the speed of static language compilers, often by using partial type inference.