Design a constraint DSL for synthesizing from meta grammar

Came over this idea when reading and playing with activity-focused natural language grammar Playing with activity-focused natural language grammar.

What I was first thinking of was how you measure the probability of a sentence in natural language. Then, I was thinking of measuring the probability of a set of sentences, and realized this could be used for problem solving.

The problem is that you need a general way to define a set of sentences. This could be done through a constraint DSL, which defines constraints for a language defined with meta grammar.

Examples of constraints:

  • A floating number between 0 and 1
  • A positive integer
  • Any items in this list

For example:

0 doc = ["(" .$:"x" ", " .$:"y" ")"]

This parses a single line of 2D coordinates, e.g. (2.4, 5.2).

The self syntax for the meta grammar can be used to parse how the grammar above is parsed. From this you can learn that in the rule doc, there is a number x and y. So, you can create a document that lists up all “variable slots” of the language, and this document can be modified manually or automatically to control the constraints of the generated text.

It could look something like this:

doc.x: [0, 1)
doc.y: (-1, 1)

A generator takes as input the meta grammar, the constraints, and outputs a random text that satisfy both the grammar and the constraints.

Machine learning

One idea is to combine this with machine learning techniques, for example GANs (Generative Adverserial Networks).

(choose constraints likely to generate text with bad predictions)
    |                                                 |
constraints + meta grammar                            |
    |                                                 |
generator -> text -> test -> reward ---------__       |
              |                                |      |
             networks -> predicted reward -> update networks
  • Network 1 predicts expected rewards using the text as input. To parse the text, it uses the meta grammar.
  • Network 2 updates the numbers in the constraints, trying to find combinations that leads to bad predictions in network 1.

Network 2 sampling

Network 2 might need to sample reward errors before updating. This is because without knowledge about the specific text used for the test, one assumes that the constraints are representative of that specific subset of the language.

Think of a specific text as a point in a larger space (the language) which boundaries are defined by the constraints. An update to move the space is arbitrary, unless one believes some function of the points give sufficient information to make a decision to move. Why change if there is no reason to believe that Network 1 will indeed make a better guess next time?

Alternative is to update very slowly, such that changes it reacts to are likely to generate text with a large overlap with the old set. However, since this introduces an overhead of generating new constraints, it is probably faster to sample using the same constraints.

It is not possible make a change in Network 2 and then replicate the test. Most likely, a complete different text will be generated that will result in a complete different reward. This gives the space a high number of dimensions where the gradient direction is very noisy.

A way to perform a probabilistic exhaustive check, to see whether space is covered sufficiently, is to generate a hash from the meta data of the text and use a Bloom filter. Then you sample from a moving average of new insertions into the Bloom filter (0 or 1), and change constraints when it falls below a threshold and Network 1 errors also have been sufficiently low.

Keep track of text with high reward

If the reward function gives a spike for a specific text, then there is no way for Network 2 to reproduce this case from the constraints. Network 1 must not forget this knowledge over time. To solve this one could sample from a collection of known texts with high rewards and feed them into Network 1 at regular intervals.

One problem is that Network 1 might forget the low rewards in the neighborhood around texts with high rewards. Even when you store the constraints with these texts, the state of Network 2 might have changed so it can no longer learn properly from the reward errors.

Therefore, only Network 1 can learn from history of successful cases.