Security-Portal.cz je internetový portál zaměřený na počítačovou bezpečnost, hacking, anonymitu, počítačové sítě, programování, šifrování, exploity, Linux a BSD systémy. Provozuje spoustu zajímavých služeb a podporuje příznivce v zajímavých projektech.

Kategorie

Expert Interview: Securing Your Third-Party Vendor Network

InfoSec Institute Resources - 1 hodina 23 min zpět

One of the most complex information security challenges is ensuring a proper level of protection when a third-party is involved. In most cases, there is no direct control over the vendor infrastructure. This means we must rely on contracts and/or agreements and, in the end, trust our partners will follow defined security requirements. Also, since […]

The post Expert Interview: Securing Your Third-Party Vendor Network appeared first on InfoSec Resources.

Expert Interview: Securing Your Third-Party Vendor Network was first posted on January 23, 2018 at 8:13 am.
©2017 "InfoSec Resources". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at darren.dalasta@infosecinstitute.com
Kategorie: Hacking & Security

Intel varuje datacentra: Neinstalujte naše záplaty na Spectre/Meltdown. Způsobují restarty

Zive.cz - bezpečnost - 1 hodina 37 min zpět
Oprava chyb Meltdown/Spectre ze strany Intelu se komplikuje. Výrobce procesorů v minulých týdnech postupně nabízel aktualizace firmwarů, které sice chybu opravily, ale na druhou stranu způsobily vyšší nestabilitu a častější restarty. Na domácím počítači to samozřejmě až takový problém není, ...
Kategorie: Hacking & Security

Cybersecurity Certification Courses – CISA, CISM, CISSP

The Hacker News - 1 hodina 59 min zpět
The year 2017 saw some of the biggest cybersecurity incidents—from high profile data breaches in Equifax and Uber impacting millions of users to thousands of businesses and millions of customers being affected by the global ransomware threats like WannaCry and NotPetya. The year ended, but it did not take away the airwaves of cybersecurity incidents, threats, data breaches, and hacks. The
Kategorie: Hacking & Security

Gas pump malware tricks customers into paying for more than they pump

Sophos Naked Security - 2 hodiny 59 min zpět
Malware-infected gas pumps display the false data, and customers end up with up to 7% less gas than they paid for

How a teen used social engineering to take on the FBI and CIA

Sophos Naked Security - 3 hodiny 19 min zpět
Of all the adversaries facing the US in cyberspace, there is one that the FBI and CIA often seem to struggle to contain: teenagers.

Twitter will email 677,775 users who engaged with Russian election trolls

Sophos Naked Security - 3 hodiny 38 min zpět
It's found and banned thousands more automated Russian accounts, and it's planning for detecting such accounts better in time for mid-terms.

Intel Warns Users Not to Install Its 'Faulty' Meltdown and Spectre Patches

The Hacker News - 3 hodiny 42 min zpět
Don't install Intel's patches for Spectre and Meltdown chip vulnerabilities. Intel on Monday warned that you should stop deploying its current versions of Spectre/Meltdown patches, which Linux creator Linus Torvalds calls 'complete and utter garbage.' Spectre and Meltdown are security vulnerabilities disclosed by researchers earlier this month in many processors from Intel, ARM and AMD used
Kategorie: Hacking & Security

OnePlus má pořádný problém: 40 tisíc uživatelů mohlo přijít o platební karty

Zive.cz - bezpečnost - 4 hodiny 32 min zpět
Minulý týden se na fóru výrobce OnePlus objevily první rozhořčené příspěvky uživatelů, kteří upozorňovali na zneužití platební karty. Ty byly využívány právě pro zaplacení objednaných telefonů OnePlus. Firma tedy zahájila interní vyšetřování, jehož výsledkem je značně znepokojivý závěr – o platební ...
Kategorie: Hacking & Security

Windows 10 nově prozradí, jaká data z vašeho počítače posílají Microsoftu

Zive.cz - bezpečnost - 5 hodin 29 min zpět
V nejnovějším testovacím sestavení Windows 10 pro Insidery se objevují dvě nové položky, které napovídají, že v nadcházející velké aktualizaci umožní systém zobrazit diagnostická data odesílaná do Redmondu a dokonce je i smazat. Prozatím odkazy nikam nevedou, funkce je tedy ve velmi rané fázi ...
Kategorie: Hacking & Security

Bug Bounty Hackers Make More Money Than Average Salaries, Report Finds

LinuxSecurity.com - 6 hodin 5 min zpět
LinuxSecurity.com: Bug bounty programs exist to reward ethical hackers with a financial award (the "bounty") for responsibly disclosing security vulnerabilities. What types of people participate in bug bounty programs and why do they do it? Those are just a few of the questions that managed bug bounty platform provider HackerOne answers in its 2018 Hacker Report.
Kategorie: Hacking & Security

Hackers steal almost $400 million from cryptocurrency ICOs

LinuxSecurity.com - 6 hodin 9 min zpět
LinuxSecurity.com: Cyberattackers have managed to line their pockets with almost $400 million in cryptocurrency by targeting ICOs, a new study has found.
Kategorie: Hacking & Security

Optimus multi-prime is the new rule as OpenSSL transforms crypto policies again

LinuxSecurity.com - 6 hodin 10 min zpět
LinuxSecurity.com: OpenSSL's maintainers have put the squeeze on insecure ciphers, with a raft of changes to how the project's operations. The changes were announced here following an OpenSSL management committee (OMC) meeting in London.
Kategorie: Hacking & Security

Critical Flaw in All Blizzard Games Could Let Hackers Hijack Millions of PCs

The Hacker News - 7 hodin 1 min zpět
A Google security researcher has discovered a severe vulnerability in Blizzard games that could allow remote attackers to run malicious code on gamers’ computers. Played every month by half a billion users—World of Warcraft, Overwatch, Diablo III, Hearthstone and Starcraft II are popular online games created by Blizzard Entertainment. To play Blizzard games online using web browsers, users
Kategorie: Hacking & Security

Uber hit with criticism of “useless” two-factor authentication

Sophos Naked Security - 22 Leden, 2018 - 23:58
An Indian researcher has created a stir by claiming Uber's 2FA is "useless". What's the full story?

Popular Sonic the HedgeHog Apps at Risk of Leaking User Data to Unverified Servers

Threatpost - 22 Leden, 2018 - 22:54
Researchers have found three Sega game apps that connect to insecure servers and risk leaking user data.
Kategorie: Hacking & Security

Blockchain networks: possible attacks and ways of protection

InfoSec Institute Resources - 22 Leden, 2018 - 19:43

Any network can be attacked, and blockchain is no exception. However, attacks on distributed ledgers differ from attacks on conventional computer networks, even secure ones. Here crooks try to manipulate the process of reaching a consensus to change the information added to the ledger. In this article, we will look at the main threats that […]

The post Blockchain networks: possible attacks and ways of protection appeared first on InfoSec Resources.

Blockchain networks: possible attacks and ways of protection was first posted on January 22, 2018 at 12:43 pm.
©2017 "InfoSec Resources". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at darren.dalasta@infosecinstitute.com
Kategorie: Hacking & Security

User Account Control(UAC) Bypass Techniques-Part 1

InfoSec Institute Resources - 22 Leden, 2018 - 19:16

In this article series, we will look at the UAC feature and learn/refresh about famous UAC bypass techniques. Although there are many techniques that have been discovered by the researchers in the past, we are going to investigate those techniques which have been used more often by the malware authors. In this part, we will […]

The post User Account Control(UAC) Bypass Techniques-Part 1 appeared first on InfoSec Resources.

User Account Control(UAC) Bypass Techniques-Part 1 was first posted on January 22, 2018 at 12:16 pm.
©2017 "InfoSec Resources". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at darren.dalasta@infosecinstitute.com
Kategorie: Hacking & Security

A silver bullet for the attacker

Kaspersky Securelist - 22 Leden, 2018 - 16:51

In the past years, the problem of vulnerabilities in industrial automation systems has been becoming increasingly important. The fact that industrial control systems have been developing in parallel with IT systems, relatively independently and often without regard for modern secure coding practices is probably the main source of ICS security problems. As a result of this, numerous custom solutions have appeared, including proprietary network protocols and algorithms for authentication and encryption. It is these solutions that were the main source of threats discovered by ICS IT security researchers. At the same time, we can see that industrial automation systems derive some of their problems from common technologies (examples include CodeSys Runtime, Microsoft Windows vulnerabilities, etc.).

Companies attach different priority levels to such problems and the risks associated with them. It is obvious for everybody that vulnerability information should never be disclosed until a patch is released. However, many companies believe that this information should not be published even when a patch is available. For software developers, this is always a blow to their reputation. And companies that use vulnerable systems are not always physically able to install a patch or this installation may involve significant costs (interrupted operation of the systems to be updated, the cost of work related to installing updates, etc.).

We assess risks based on our experience of a security system developer and supplier. We are convinced that it is absolutely essential to inform users of vulnerable software about the new threat and the need to update their software as soon as possible. This certainly does not guarantee that all users of vulnerable systems will promptly update them and the threat will go away. However, in our experience, if this is not done very few users update their systems in a timely manner, even if patches are available. We confront hundreds of thousands of new threats every day and we can see that threat actors are on a constant lookout for new attack opportunities. And we realize that by keeping silent about problems we give those threat actors a chance.

This is why we decided to share information on one of our discoveries: according to our research, connecting a software license management token to a computer may open a hidden remote access channel for an attacker.

Why we decided to analyze SafeNet Sentinel

While performing various penetration tests, Kaspersky Lab ICS CERT experts repeatedly encountered the same service on the computers of customers who used software and hardware solutions by different industrial vendors. The experts didn’t attach much importance to it until it was found to be vulnerable. The service was hasplms.exe, which is part of the SafeNet Sentinel hardware-based solution by Gemalto. The solution provides license control for software used by customers and is widely used in ICS and IT systems.

The solution’s software part consists of a driver, a web application and a set of other software components. The hardware part is a USB token. The token needs to be connected to a PC or server on which a software license is required. Some of the USB token models are listed in the table below.

License control solutions of this type are based on the following operating principles: a software product requires a license to operate properly; when a USB token is plugged into the computer, the software “sees” the license and becomes fully functional. The token must be plugged in every time the software is started and remain connected while it is in use. The software part of the Gemalto solution is installed once and remains functional regardless of the life cycle of the software requiring a token.

This Gemalto solution is used in products by other software vendors, including such companies as ABB, General Electric, HP, Cadac Group, Zemax and many other organizations, the number of which, according to some estimates, reaches 40 thousand.

According to the results of independent research conducted by Frost and Sullivan in 2011, SafeNet Sentinel, which is currently owned by Gemalto, has a 40% market share for license control solutions in North America and over 60% in Europe.

The number of end users who use Gemalto solutions is not known. However, if each company has 100 clients, the number of users is in the millions. Unfortunately, few people realize that connecting a token to a computer to control licenses may not be a safe thing to do.

Vulnerabilities and attack vectors

From researchers’ viewpoint, hasplms.exe exhibited a rather curious behavior in the system: it could be remotely accessed and communicated with on open port 1947. The protocol type was defined by the network packet header – either HTTP or a proprietary binary protocol was used. The service also had an API of its own, which was based on the HTTP protocol.

Analyzing the service was made more difficult by the fact that the binary file used a VMProtect-type protector and generated its bytecode from the original Gemalto code. Due to this, it was decided to use fuzzing as the main tool for analyzing the vulnerable service’s behavior.

First of all, we looked at the localization function – the user could download language packs consisting of two files, one of which was localize.xml. The second file, in HTML format, had parameters, one of which turned out to be vulnerable to buffer overflow. It would have been a simple vulnerability, if it wasn’t for one curious detail: although, as mentioned above, a protector was used, for some reason the developers did not use any of the classical mechanisms providing protection from such binary vulnerabilities (such as Stack Canary, Stack Cookie, ASLR, etc.). As a result, a simple buffer overflow could allow an attacker to execute arbitrary code on the remote system.

Note that such software development flaws are very rare in modern solutions. As a rule, secure coding practices are implemented when developing serious commercial products (such as SDL – security development lifecycle), which means that security is designed into applications at the development stage, rather than being implemented as an additional option.

This attack vector can be used without LPE (local privilege escalation) – the vulnerable process runs with SYSTEM privileges, enabling malicious code to run with the highest privileges.

Sample script loading a language pack file

Result of Buffer Overflow exploitation, leading to RCE

The vulnerability was assigned the number CVE-2017-11496.

This was just one of the vulnerabilities we found. And the overall result of our research was disquieting.

In late 2016 – early 2017, 11 vulnerabilities were identified: two allowed remote code execution if exploited and nine were denial-of-service vulnerabilities.

By June 2017, Kaspersky Lab ICS CERT had identified three more vulnerabilities: an XML bomb and two denial-of-service flaws, one of which could potentially lead to remote execution of arbitrary code.

In total, 14 vulnerabilities have been identified, all quite dangerous (for example, exploitation of each of the Remote Execution of Arbitrary Code type vulnerabilities is automatically performed with SYSTEM privileges, i.e., the highest privilege level in Windows).

All attack vectors affecting the vulnerable service were multi-stage.

We promptly sent all information on the vulnerabilities identified to Gemalto. The vulnerabilities were assigned the following respective CVE numbers:

In addition to vulnerability descriptions, we sent a description of peculiar functionality to Gemalto.

Peculiar functionality

Kaspersky Lab ICS CERT experts have found that hasplms.exe has some rather unusual functionality:

  • When a Gemalto USB token is first connected to a computer (even if the active session is blocked), a driver and service that accepts network connections on port 1947 are installed if the Internet access is available.
  • If a driver is manually downloaded from the Gemalto website and installed, a driver and service that accept network connections on port 1947 are installed and port 1947 is added to Windows firewall exceptions.
  • If Gemalto software is installed as part of a third-party installation file, port 1947 is also added to Windows firewall exceptions.
  • There is an API function which enables or disables the administrative panel in the web interface, making it possible to modify the settings of the program part of the SafeNet Sentinel hardware-based solution. The panel is available by default on the localhost IP address – 127.0.0.1.
  • The API can be used to change the internal proxy settings for updating language packs.
  • After changing the proxy server, the service’s internal logic can be used to obtain the NTLM hash of the user account under which the hasplms.exe process is running (i.e., SYSTEM).

This appears to be an undocumented feature and can be used for stealthy remote access. This means that remote attackers can use these capabilities to gain access to the administrative panel of the Gemalto software, carry out attacks with system user privileges and conceal their presence after completing these attacks.

As mentioned above, Gemalto representatives were informed of this attack vector.

Non-transparent security

Solutions, technologies or individual software modules used by many third-party vendors often do not undergo proper security testing. This potentially opens up new attack vectors. At the same time, closing vulnerabilities in such products, which are often used, among other applications, in banking and industrial control systems, is not always a smooth process: for some reason, vendors of such systems are in no hurry to notify their users of problems identified in their products.

In early 2017, we sent information about 11 vulnerabilities we had identified to Gemalto. It was only in late June that, in response to our repeated requests, the vendor informed us that a patch had been released and information about the vulnerabilities that had been closed, as well as a new version of the driver, could be found on the company’s internal user portal.

On June 26, we informed Gemalto of the suspicious functionality and of three more vulnerabilities. This time, things went quicker: on July 21 the vendor released a private notice on a new driver version – without any mention of the vulnerabilities closed.

According to Gemalto, the company has notified all of its customers of the need to update the driver via their account dashboards. However, this was apparently not sufficient: after we published information about the vulnerabilities identified, we were contacted by several developers of software which uses hasplms. It became clear from our communication with them that they were not aware of the problem and continued to use versions of the product with multiple vulnerabilities.

Update software to the current version (7.6) ASAP

We urge those users and companies that use Gemalto’s SafeNet Sentinel to install the latest (secure) version of the driver as soon as possible or contact Gemalto for instructions on updating the driver. We also recommend closing port 1947, at least on the external firewall (on the network perimeter) – but only as long as this does not interfere with business processes.

In the case of installing the driver via Microsoft Windows Update servers, we recommend checking hasplms.exe to make sure it is the latest version. If an obsolete version is used, it is crucial to install the latest (secure) version of the driver from the vendor’s website or contact Gemalto for instructions on updating the driver.

We also recommend closing port 1947, at least on the external firewall (on the network perimeter) – but only as long as this does not interfere with business processes. This will help to reduce the risk of the vulnerabilities being exploited.

Some software vendors who use third-party solutions as part of their products may be very thorough about the security of their own code, while leaving the security of third-party solutions to other companies (the vendors of these solutions). We very much hope that most companies act responsibly both with respect to their own solutions and with respect to third-party solutions used in their products.

MySQL grammar in ANTLR 4

Positive Research Center - 22 Leden, 2018 - 15:28
The main purpose of a web application firewall is to analyze and filter traffic relevant to an application or a class of applications, such as web applications or database management systems (DBMS). A firewall needs to speak the language of the application it is protecting. For a relational DBMS, the language in question will be an SQL dialect.

Let us assume that the task is to build a firewall to protect a DBMS. In this case, the firewall must recognize and analyze SQL statements in order to determine whether they comply with the security policy. The depth of analysis depends on the task required (for example, detection of SQL injection attacks, access control, or correlation of SQL and HTTP requests). In any case, the firewall must perform lexical, syntactic, and semantic analysis of SQL statements.

ContentsIntroduction
Formal grammarWith a formal grammar of a language, we can obtain a full picture of how the language is structured and analyze it. Formal grammars help to create statements and recognize them using a syntactic analyzer.

According to the Chomsky hierarchy, there are four types of languages and, consequently, four types of grammars. Grammars are differentiated by their derivations. MySQL is a context-sensitive language. However, a context-sensitive grammar alone cannot yield a large number of language patterns. Generally, a context-free grammar is sufficient for generating the language patterns used in practice. This article describes development of a context-free grammar for MySQL.

Key termsA language is defined on the basis of its alphabet, that is to say a set of symbols. Letters of the alphabet are united into meaningful sequences called lexemes. There are different types of lexemes (for example, identifiers, strings, and keywords). A token is a tuple consisting of a lexeme and a type name. A phrase is a sequence of specifically arranged lexemes. Phrases can be used to create statements. A statement refers to a complete sequence of lexemes that has independent meaning in the context of a given language. The notion statement has applied significance only.

An application based on any given language makes use of statements in that language, such as by running or interpreting these statements. From an applied point of view, a phrase is an incomplete structure, which can be a part of a statement. However, phrases are more useful for generating a grammar. Phrases and statements are similar from the point of view of the grammar—they follow certain of its rules with nonterminals on their right side.

Use of a language implies formation or recognition of statements. Recognition refers to the capability, for any sequence of lexemes received as input, to provide an answer the question "Does this sequence constitute a set of valid statements in this language?" as output.

MySQL languageThe MySQL language is an SQL dialect used to write queries for the MySQL DBMS. SQL dialects follow the ISO/IEC 9075 Information technology – Database languages – SQL standard (or, strictly speaking, series of standards).

The MySQL dialect is a specific implementation of this standard with particular limitations and additions. Most MySQL statements can be described using a context-free grammar, but some statements require a context-sensitive grammar for their description. Put simply, if a lexeme can affect the recognition of subsequent phrases, then the phrase must be described by a rule of a context-sensitive grammar.

Some MySQL expressions are formed using this approach. For example:

DELIMITER SOME_LITERAL
In this case it is required to memorize SOME_LITERAL, because this literal will be used in subsequent statements instead of ; to mark their ending.

In a procedural extension, loop and compound statements can be tagged with labels having the following structure:

label somephrases label
In this case, label identifiers should be identical. Such a statement can be built only using a context-sensitive grammar.

ANTLRWe selected ANTLR as the parser generator for developing a MySQL parser. ANTLR has the following advantages to recommend it:



ANTLR uses a two-step algorithm for generating recognition code. The first step is to describe the lexical structure of a language, i.e., determine what the tokens are. The second step is to describe the syntactic structure of the language by grouping the tokens from the previous step into statements. Lexical and syntactic structures in ANTLR are described by rules. The lexical structure is defined by the type (lexeme descriptor) and value. To describe the value, a language with regular expression elements, but supporting recursion, is used. The syntactic structure rule is composed of lexeme descriptors based on the statement composition rules in ANTLR 4, which allow defining the structure of lexeme arrangement in a statement or a phrase within a statement.

When creating rules, the core principle of lexical analysis (to which ANTLR is no exception) should be taken into account. A lexer starts by recognizing the longest sequence of symbols in the input stream that can be described by any lexical rule. If multiple matching rules exist, the one with highest precedence is applied.

Without using semantic predicates in ANTLR, only a context-free grammar can be built. An advantage in this case is that such a grammar does not depend on the runtime environment. The grammar proposed in this article for the MySQL dialect is built without using semantic predicates.

Lexer
Getting startedThe first step in developing a grammar is to define a list of the lexeme types that occur in the language. The recognizer accepts alphabet symbols, from which it must form lexemes; symbols that are not used as a part of lexemes, such as spaces and comments, can be filtered out. Thanks to filtering, only meaningful lexemes of the language are set aside for further analysis. Spaces and comments can be filtered out as follows:

SPACE: [ \t\r\n]+ -> channel(HIDDEN); COMMENT_INPUT: '/*' .*? '*/' -> channel(HIDDEN); LINE_COMMENT: ('-- ' | '#') ~[\r\n]* ('\r'? '\n' | EOF) -> channel(HIDDEN);
Potential lexical errors can also be taken into account, and unknown symbols can be omitted:

ERROR_RECONGNIGION: . -> channel(ERRORCHANNEL);
If a symbol is not recognized by any lexical rule, it is recognized by the rule ERROR_RECOGNITION. This rule is placed at the end of the grammar, giving it the lowest priority.

Now we can start identifying lexemes. Lexemes can be classified under the following types:
  • Keywords
  • Identifiers
  • Literals
  • Special symbols

If there is no obvious (or implicit) intersection between these types of lexemes in the language, it is only required to describe all lexemes. However, if there are any intersections, they should be resolved. The situation becomes complicated because some lexemes require a regular grammar for recognition. In MySQL, this is an issue for identifiers with a dot (fully qualified name) and keywords that can be identifiers.
Identifiers with a dotRecognition of such MySQL lexemes as identifiers starting with a numeral has certain difficulties: the "." symbol can occur in both full column names and real literals:

select table_name.column_name as full_column_name ...
select 1.e as real_number ...

Therefore, it is required to recognize correctly a full column name in the first case, and a real literal in the second case. An intersection here is caused by the fact that identifiers in MySQL can start with numerals.

MySQL sees the phrase

someTableName.1SomeColumn
as a sequence of three tokens:

(someTableName, identifier), (. , dot delimeter), (1SomeColumn, identifier)

In this example, it is quite natural to use the following rules for recognition:

DOT: .; ID_LITERAL: [0-9]*[a-zA-Z_$][a-zA-Z_$0-9]*;

and the following rule for numerals:

DECIMAL_LITERAL: [0-9]+;
Tokenization results in a sequence of four tokens:

(someTableName, identifier), (. , dot delimeter), (1, число), (SomeColumn, identifier)

To avoid ambiguity, an auxiliary structure can be introduced to recognize identifiers:

fragment ID_LITERAL: [0-9]*[a-zA-Z_$][a-zA-Z_$0-9]*;
and prioritized rules can be defined:

DOT_ID: '.' ID_LITERAL; ... ID: ID_LITERAL; ... DOT: '.'

Since ANTLR recognizes sequences of maximum length, "." will surely not be recognized as a separate symbol.

StringsUsing strings as an example, we can illustrate one more rule of lexical analysis in ANTLR. A string in MySQL is a sequence of almost any characters put between single or double quotation marks. Strings put in single quotation marks cannot contain a single backslash and a quotation mark, because a lexer would not be able to determine where the string ends. If such characters have to be used, a single quote is replaced by two consecutive single quotes to escape these characters. Moreover, an escape character inside a string cannot be unaccompanied, since there is supposed to be something to be escaped. Therefore, use of this character by itself without an accompanying character should also be prohibited. As a result, we obtain the following fragment of a lexical rule:

fragment SQUOTA_STRING: '\'' ('\\'. | '\'\'' | ~('\'' | '\\'))* '\'';
  • '\\'. allows a backslash and the symbol it escapes.
  • '\'\'' allows a sequence of two single quotation marks.
  • ~('\'' | '\\') prohibits a standalone single quotation mark or escape character.
KeywordsAn ANTLR lexer, by contrast with a parser, applies rules in order of precedence. Rules that have been defined earlier have a higher priority than those described later. This approach gives a clear instruction for rule sorting: higher priority is given to specific rules (defining keywords and special symbols), followed by general rules (for recognizing literals, variables, identifiers, etc.).

Special comment type in MySQLMySQL uses a special comment style that can span multiple lines. Such comments allow creating queries compatible with other DBMSs with no need to follow MySQL-specific requirements. When generating a query, MySQL will analyze the text in such comments. To recognize special MySQL comments, we can use the following rule:

SPEC_MYSQL_COMMENT: '/*!' .+? '*/' -> channel(MYSQLCOMMENT);
However, using this rule by itself is not enough for correctly parsing queries.

Assume that a query of the following kind is received:

select name, info /*!, secret_info */ from users;

Applying the above-mentioned rule, we obtain the following sequence of tokens:

(SELECT, 'select') (ID, 'name') (COMMA, ',') (ID, 'info') (SPEC_MYSQL_COMMENT, '/*!, secret_info */') (FROM, 'from') (ID, 'users') (SEMI, ';')

Whereas the standard MySQL lexer recognizes slightly different tokens:

(SELECT, 'select') (ID, 'name') (COMMA, ',') (ID, 'info') (COMMA, ',') (ID, 'secret_info') (FROM, 'from') (ID, 'users') (SEMI, ';')

That is why correct recognition of comments written in the unique MySQL style requires additional processing:
  1. Source text is recognized by a special lexer for preprocessing.
  2. Values are extracted from the SPEC_MYSQL_COMMENT tokens and a new text is created, which will be processed only by a MySQL server.
  3. The newly created text is processed using an ordinary parser and lexer.

A lexer for preprocessing splits the input stream into phrases that are part of:
  • Special comments (SPEC_MYSQL_COMMENT)
  • Main queries (TEXT)

The rules can be arranged in the following way:

lexer grammar mysqlPreprocessorLexer; channels { MYSQLCOMMENT } TEXT: ~'/'+; SPEC_MYSQL_COMMENT: '/*!' .+? '*/'; //-> channel(MYSQLCOMMENT);SLASH: '/' -> type(TEXT);

A pre-lexer splits query code into a sequence of the SPEC_MYSQL_COMMENT and TEXT tokens. If a MySQL statement is processed, values extracted from the SPEC_MYSQL_COMMENT tokens are combined with values of the TEXT tokens. Then the resulting text is processed by the standard MySQL lexer. If another SQL dialect is used, the SPEC_MYSQL_COMMENT tokens are simply removed or set aside.

Case insensitivityAlmost all lexemes in MySQL are case-insensitive, which means that the following two queries are identical:

select * from t; SelECT * fROm t;

Unfortunately, ANTLR does not support case-insensitive tokens. Therefore it is necessary to apply the following entry with fragment tokens, which are used to build actual tokens:

SELECT: S E L E C T; FROM: F R O M; fragment S: [sS]; fragment E: [eE];

This makes a grammar less readable. Moreover, a lexer has to select one of two variants for each symbol—upper or lower case—which has a negative impact on performance.

To make lexer code cleaner and improve performance, the input stream should be normalized, meaning that all symbols should be in the same case (upper or lower). ANTLR supports a special stream that disregards case during lexical analysis, but retains the cases of original token values. These tokens can be used in tree traversal.

Implementation of such a stream for various runtimes has been suggested by @KvanTTT. The implementation can be found in the DAGE project, a cross-platform editor of ANTLR 4 grammars.
As a result, all lexemes are written either in lower or in upper case. Because normally SQL keywords in queries are written in upper case, it was decided to use upper case for the grammar:

SELECT: "SELECT"; FROM: "FROM";
ParserTo describe the syntactic structure of a language, the sequence of the following components should be defined:
  • Statements in a text
  • Phrases in a statement
  • Lexemes and phrases inside larger phrases

Text structure in MySQLThere is an excellent MySQL grammar description, though it is spread across the whole reference guide. The arrangement of statements in a text is given in the section that describes the MySQL client/server messaging protocol. We can see that all statements, except possibly the last one, use a semicolon (;) as the delimiter. Moreover, there is a peculiarity about inline comments: the last sentence in a text can end with such a comment. As a result, it turns out that any valid sequence of statements in MySQL can be represented in the following form:

root : sqlStatements? MINUSMINUS? EOF ; sqlStatements : (sqlStatement MINUSMINUS? SEMI | emptyStatement)* (sqlStatement (MINUSMINUS? SEMI)? | emptyStatement) ;... MINUSMINUS:

A context-free grammar is not powerful enough to ensure full-fledged support of these rules, because a MySQL client can use the DELIMITER command to set the current delimiter. In this case, it is required to memorize and use the delimiter in other rules. Thus, if we use the DELIMITER directive, SQL statements that are written correctly will not be recognized by the grammar under discussion.

Types of MySQL statementsMySQL statements can be of the following types:
  • DDL statements
  • DML statements
  • Transaction statements
  • Replication statements
  • Prepared statements
  • Server administration statements
  • Utility statements
  • Procedural extension statements

The root rule for statements, based on the MySQL documentation, looks as follows:

sqlStatement : ddlStatement | dmlStatement | transactionStatement | replicationStatement | preparedStatement | administrationStatement | utilityStatement

There is also an empty statement that consists of a single semicolon:

empty_statement : SEMI ; SEMI: ';';

The subsequent chapters of the official documentation have been similarly transformed into ANTLR rules.

SELECTThe SELECT statement is probably the most interesting and wide-ranging statement in both MySQL and SQL in general. When writing the grammar, our main focus went towards the following tasks:
  • Description of tables
  • Description of expressions
  • Combination using UNION

Let us start with description of tables. MySQL has a rather elaborate description of what can be used in the FROM field of SELECT queries (which here we'll call "table references"). Careful study and testing on actively used versions reveals that table references have the following structure:

Table object 1, Table object 2, …, Table object N
where a "table object" is one of four structures:
  • A separate table
  • Joined tables
  • A subquery
  • Table references in parentheses

If we start from less general, we get a table object inductively recognized as a table or a table-based structure. The latter can be:
• Joined tables
• A subquery
• A sequence of table objects in parentheses

Then a sequence of table objects, consisting of at least one table object, is detected in the FROM field. Of course, the grammar describes additional structures, such as "connection conditions" and references to partitions (PARTITION), but the general structure is as follows:

tableSources : tableSource (',' tableSource)* ; tableSource : tableSourceItem joinPart* | '(' tableSourceItem joinPart* ')' ; tableSourceItem : tableName (PARTITION '(' uidList ')' )? (AS? alias=uid)? (indexHint (',' indexHint)* )? #atomTableItem | (subquery | '(' parenthesisSubquery=subquery ')') AS? alias=uid #subqueryTableItem | '(' tableSources ')' #tableSourcesItem ;
ExpressionsExpressions are widely used in MySQL wherever it is required to evaluate a value (value vector). Inductively, an expression can be defined as follows:
  • An expression is any lexeme that is:
    • a constant (literal) value
    • a variable
    • an object identifier
  • An expression is a superposition of expressions that have been united by transformations.
Transformations include operations, operators (including set-theoretic and comparison operators), functions, queries, and parentheses.

UNIONUnlike other dialects, MySQL has only two set-theoretic operations on tables. The first one, JOIN, has already been considered. Empirically, we found that the description of UNION in the official documentation is incomplete. We built upon it in the following way:

selectStatement : querySpecification lockClause? #simpleSelect | queryExpression lockClause? #parenthesisSelect | querySpecificationNointo unionStatement+ ( UNION (ALL | DISTINCT)? (querySpecification | queryExpression) )? orderByClause? limitClause? lockClause? #unionSelect | queryExpressionNointo unionParenthesis+ ( UNION (ALL | DISTINCT)? queryExpression )? orderByClause? limitClause? lockClause? #unionParenthesisSelect ;
If UNION is used, individual queries can be enclosed in parentheses. Using parentheses is not essential, unless queries use ORDER BY and LIMIT. However, if the first query in UNION is in parentheses, they should be used for all subsequent queries as well.

Incorrect:

(select 1) union select 2; (select 1) union (select 2) union select 3;

Correct:

(((select 1))) union (select 2); (select 1) union ((select 2)) union (select 3);
Use of grammarA grammar is written to solve tasks relevant to syntactic and lexical analysis. On the one hand, recognition is required to be performed as quickly as possible; on the other hand, any developed applications should be able to take advantage of lexer and parser code without compromising functionality and performance.

An application that uses a parser most likely applies either the Visitor or Observer design pattern. Both patterns involve analysis of a defined subset of the nodes of a parse tree. Nodes of a parse tree, other than leaf nodes, correspond to certain syntax rules. To analyze nodes of a parse tree, we must look at the child nodes, which could be either individual nodes or groups of nodes, that correspond to fragments of a parent rule.

A critical condition for developing a good grammar is the ability to gain "easy" access to any part of the rule. Intuitively, "easy" access can be described as the possibility to get any given part as an object without searching and iterating. This is implemented in ANTLR by means of such entities as alternative and element labels. Alternative labels allow splitting a complex rule into alternative phrases and, if the Visitor pattern is used, processing each of them using a separate method. For example, a table object in MySQL can be defined by the following rule:

tableSourceItem : tableName (PARTITION '(' uidList ')' )? (AS? alias=uid)? (indexHint (',' indexHint)* )? | (subquery | '(' parenthesisSubquery=subquery ')') AS? alias=uid | '(' tableSources ')' ;

We can see that a table object is defined as one of three possible variants:
  • A table
  • A subquery
  • A sequence of table objects in parentheses

Therefore, instead of processing the whole structure, alternative labels are used, creating the possibility to process each variant independently of the others:
tableSourceItem : tableName (PARTITION '(' uidList ')' )? (AS? alias=uid)? (indexHint (',' indexHint)* )? #atomTableItem | (subquery | '(' parenthesisSubquery=subquery ')') AS? alias=uid #subqueryTableItem | '(' tableSources ')' #tableSourcesItem ;

Element labels are used to label individual nonterminals and their sequences. These labels allow accessing the content of a rule context as a field with a certain name. Thus, instead of extracting a certain content element from some context, an element label would be sufficient. By contrast, extraction depends on the rule structure. The more elaborate a rule, the more complicated extraction becomes.

For example, the rule:

loadXmlStatement : LOAD XML (LOW_PRIORITY | CONCURRENT)? LOCAL? INFILE STRING_LITERAL (REPLACE | IGNORE)? INTO TABLE tableName (CHARACTER SET charsetName)? (ROWS IDENTIFIED BY '<' STRING_LITERAL '>')? ( IGNORE decimalLiteral (LINES | ROWS) )? ( '(' assignmentField (',' assignmentField)* ')' )? (SET updatedElement (',' updatedElement)*)? ;

requires extracting the tag name that identifies strings imported by the LOAD XML operator. There is also the need to identify the conditions that would determine the specific form of the LOAD XML operator:
  • Is any priority explicitly set for the operator? If yes, then what priority?
  • Which string append mode will be used by the operator?
  • Exactly what kind of syntax will be used if the syntax is used to ignore several initial strings during import?

To obtain the required values immediately in code without any extraction, element labels can be used:
loadXmlStatement : LOAD XML priority=(LOW_PRIORITY | CONCURRENT)? LOCAL? INFILE file=STRING_LITERAL violation=(REPLACE | IGNORE)? INTO TABLE tableName (CHARACTER SET charsetName)? (ROWS IDENTIFIED BY '<' tag=STRING_LITERAL '>')? ( IGNORE decimalLiteral linesFormat=(LINES | ROWS) )? ( '(' assignmentField (',' assignmentField)* ')' )? (SET updatedElement (',' updatedElement)*)? ;
Simplifying code of the target application simplifies the grammar as well, because the names of alternative labels improve readability.

ConclusionDeveloping grammars for SQL languages is quite challenging, because they are case-insensitive and contain a large number of keywords, ambiguities, and context-sensitive structures. In particular, while developing our MySQL grammar, we implemented processing of special types of comments, developed a lexer able to differentiate between identifiers with a dot and real literals, and wrote the parser grammar, which covers the majority of the MySQL syntax described in the documentation. The MySQL grammar developed by us can be used to recognize queries generated by WordPress and Bitrix, as well as other applications that do not require exact processing of context-sensitive cases. The grammar source files are available in the official grammar repository under the MIT license.

Author: Ivan Khudyashov, Positive Technologies

Famous cryptographers’ tombstone cryptogram decrypted

Sophos Naked Security - 22 Leden, 2018 - 15:00
A paper at Schmoocon 2018 over the weekend revealed a delightful cryptogram on William and Elizebeth Friedmans' tombstone.
Syndikovat obsah