Skip to content
Snippets Groups Projects
Unverified Commit af012446 authored by Michael White's avatar Michael White Committed by GitHub
Browse files

Merge pull request #17 from dmhowcroft/patch-1

added links to @dmhowcroft's introductory posts
parents 15f7d1df 2563ab52
No related branches found
No related tags found
No related merge requests found
# Introduction
# OpenCCG
See CHANGES for a description of the project status. Also see the web site and wiki at UT Austin:
OpenCCG is a system for parsing and generating text using [combinatory categorial grammar](https://en.wikipedia.org/wiki/Combinatory_categorial_grammar) for syntax and [hybrid logic dependency semantics](https://www.aclweb.org/anthology/P02-1041) for, well, the semantic representation.
If that seems like a mouthful, don't worry too much about the details right now.
You can get started [installing OpenCCG](https://davehowcroft.com/post/installing-openccg/) and [working with OpenCCG using the `tccg` utility](https://davehowcroft.com/post/getting-started-with-openccg/) right now.
If, on the other hand, you want to start understanding what that mouthful means, Johanna Moore at the University of Edinburgh has some [helpful course notes on NLG in general and OpenCCG in particular](https://www.inf.ed.ac.uk/teaching/courses/nlg/).
# Project information
See CHANGES for a description of the project status. Also see the OpenCCG web site and wiki at UT Austin:
* http://openccg.sf.net
* http://www.utcompling.com/wiki/openccg
This file contains the configuration and build instructions. Next you'll probably want to look at the tutorial on writing grammars in the human-friendly 'dot ccg' syntax on [the UT Austin OpenCCG wiki](http://www.utcompling.com/wiki/openccg/visccg-tutorial).
This `README.md` file contains the configuration and build instructions. Next you'll probably want to look at the tutorial on writing grammars in the human-friendly 'dot ccg' syntax on [the UT Austin OpenCCG wiki](http://www.utcompling.com/wiki/openccg/visccg-tutorial).
After that it may be helpful to look at the "native" grammar specification in "Specifying Grammars for OpenCCG: A Rough Guide" in docs/grammars-rough-guide.pdf, as well as the SAMPLE_GRAMMARS file for descriptions of the sample grammars that come with the distribution, including ones using the DotCCG syntax. A (somewhat dated) programmer's guide to using the OpenCCG realizer appears in docs/realizer-manual.pdf.
After that it may be helpful to look at the "native" grammar specification in "Specifying Grammars for OpenCCG: A Rough Guide" in `docs/grammars-rough-guide.pdf`, as well as the `SAMPLE_GRAMMARS` file for descriptions of the sample grammars that come with the distribution, including ones using the DotCCG syntax. A (somewhat dated) programmer's guide to using the OpenCCG realizer appears in `docs/realizer-manual.pdf`.
This release also includes a broad English coverage grammar from the CCGBank and associated statistical models; see docs/ccgbank-README for details.
This release also includes a broad English coverage grammar from the CCGBank and associated statistical models; see `docs/ccgbank-README` for details.
# Requirements
......@@ -20,16 +29,16 @@ This release also includes a broad English coverage grammar from the CCGBank and
# Libraries
If you're working with the latest source version from github, you'll need to download the external libraries from the latest release, as github discourages including binaries in their repos:
If you're working with the latest source version from GitHub, you'll need to download the external libraries from the latest release, as GitHub discourages including binaries in their repos:
* Download the latest release from sourceforge (https://sourceforge.net/projects/openccg/)
* Unpack the archive and copy over the files from openccg/lib/, as well as openccg/ccgbank/bin/ner/NERApp.jar
* Download the [latest release of OpenCCG from sourceforge](https://sourceforge.net/projects/openccg/)
* Unpack the archive and copy over the files from `openccg/lib/`, as well as `openccg/ccgbank/bin/ner/NERApp.jar`
* Build the latest source as described further below
# Configuring your environment variables
The easiest thing to do is to set the environment variables JAVA_HOME and OPENCCG_HOME to the relevant locations on your system. Set JAVA_HOME to match the top level directory containing the Java installation you want to use.
The easiest thing to do is to set the environment variables `JAVA_HOME` and `OPENCCG_HOME` to the relevant locations on your system. Set `JAVA_HOME` to match the top level directory containing the Java installation you want to use.
For example, on Windows:
......@@ -48,9 +57,9 @@ or on Unix:
On Windows, to get these settings to persist, it's actually easiest to set your environment variables through the System Properties from the Control Panel. For example, under WinXP, go to Control Panel, click on System Properties, choose the Advanced tab, click on Environment Variables, and add your settings in the User variables area.
Next, likewise set OPENCCG_HOME to be the top level directory where you unzipped the download. In Unix, type 'pwd' in the directory where this file is and use the path given to you by the shell as OPENCCG_HOME. You can set this in the same manner as for JAVA_HOME above.
Next, likewise set `OPENCCG_HOME` to be the top level directory where you unzipped the download. In Unix, type `pwd` in the directory where this file is and use the path given to you by the shell as `OPENCCG_HOME`. You can set this in the same manner as for `JAVA_HOME` above.
Next, add the directory OPENCCG_HOME/bin to your path. For example, you can set the path in your .bashrc file as follows:
Next, add the directory `OPENCCG_HOME/bin` to your path. For example, you can set the path in your `.bashrc` file as follows:
```
> export PATH="$PATH:$OPENCCG_HOME/bin"
......@@ -58,7 +67,7 @@ Next, add the directory OPENCCG_HOME/bin to your path. For example, you can set
On Windows, you should also add the python main directory to your path.
Finally, if you are going to use KenLM with very large language models for realization with CCGbank-extracted grammars on linux, you'll also need to set the library load path:
Finally, if you are going to use [KenLM](https://kheafield.com/code/kenlm/) with very large language models for realization with CCGbank-extracted grammars on linux, you'll also need to set the library load path:
```
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OPENCCG_HOME/lib
......@@ -66,16 +75,16 @@ Finally, if you are going to use KenLM with very large language models for reali
Once you have taken care of these things, you should be able to build and use the OpenCCG Library.
Note: Spaces are allowed in JAVA_HOME but not in OPENCCG_HOME. To set an environment variable with spaces in it, you need to put quotes around the value when on Unix, but you must *NOT* do this when under Windows.
**Note**: Spaces are allowed in `JAVA_HOME` but not in `OPENCCG_HOME`. To set an environment variable with spaces in it, you need to put quotes around the value when on Unix, but you must *NOT* do this when under Windows.
# Increasing Java memory limit
If you're working with a broad coverage grammar and statistical parsing or realization models, you'll probably need to increase the default memory limit for running OpenCCG's tools. You can do so by editing bin/ccg-env[.bat], increasing the JAVA_MEM environment variable at the end of this script. For training perceptron models in memory, you may need 16g; for realization with the very large gigaword 5-gram model, you may need 8g; otherwise, for parsing and realization with ccgbank-derived models, 4g or possibly even 2g should suffice; finally, for small grammars 512m or 256m should be ok.
If you're working with a broad coverage grammar and statistical parsing or realization models, you'll probably need to increase the default memory limit for running OpenCCG's tools. You can do so by editing `bin/ccg-env[.bat]`, increasing the JAVA_MEM environment variable at the end of this script. For training perceptron models in memory, you may need 16g; for realization with the very large gigaword 5-gram model, you may need 8g; otherwise, for parsing and realization with CCGbank-derived models, 4g or possibly even 2g should suffice; finally, for small grammars 512m or 256m should be ok.
# Trying it out
If you've managed to configure the system, you should be able to change to the directory for the "tiny" sample grammar and run "tccg" (for text ccg), the command-line tool for interactively testing grammars:
If you've managed to configure the system, you should be able to change to the directory for the "tiny" sample grammar and run `tccg` (for text ccg), the command-line tool for interactively testing grammars:
```
> cd grammars
......@@ -83,9 +92,9 @@ If you've managed to configure the system, you should be able to change to the d
> tccg (Windows/Unix)
```
Provided tccg starts properly, it loads the grammar files, parses them, and shows the command-line interface (at which point you can type :h for help or :q to quit).
Provided tccg starts properly, it loads the grammar files, parses them, and shows the command-line interface (at which point you can type `:h` for help or `:q` to quit).
If you trouble starting up tccg, make sure you have set the environment variables properly, and that the tccg script (located in openccg/bin) calls the right shell environment (top-line of the script; to solve the problem, either comment out this line or correct the path).
If you trouble starting up tccg, make sure you have set the environment variables properly, and that the tccg script (located in `openccg/bin`) calls the right shell environment (top-line of the script; to solve the problem, either comment out this line or correct the path).
# Visualizing semantic graphs
......@@ -99,9 +108,9 @@ Semantic dependency graphs in testbed files can be visualized with the help of G
> ccg-draw-graph -i tb.xml -v graphs/g
```
You can also show the semantic classes or word indices using the -c or -w options, respectively. The graphs can be displayed with any PDF display tool.
You can also show the semantic classes or word indices using the `-c` or `-w` options, respectively. The graphs can be displayed with any PDF display tool.
Note that the graph visualization requires the logical forms to be stored in an xml node-rel format for graphs, as in the worldcup or routes sample grammars. See SAMPLE_GRAMMARS for more information.
Note that the graph visualization requires the logical forms to be stored in an xml node-rel format for graphs, as in the worldcup or routes sample grammars. See `SAMPLE_GRAMMARS` for more information.
# Creating disjunctive logical forms
......@@ -110,14 +119,14 @@ This release includes a new disjunctivizer package, for creating a disjunctive L
# Generating grammar documentation
OpenCCG includes a tool for generating HTML documentation of the XML files that specify a grammar. It can be run either from the ccg-grammardoc script in the bin/ directory, or as an Ant task. An example of how to incorporate GrammarDoc into an Ant build file is given in the "tiny" grammar (grammars/tiny/build.xml), in a build target called "document."
OpenCCG includes a tool for generating HTML documentation of the XML files that specify a grammar. It can be run either from the `ccg-grammardoc` script in the `bin/` directory, or as an Ant task. An example of how to incorporate GrammarDoc into an Ant build file is given in the "tiny" grammar (`grammars/tiny/build.xml`), in a build target called `document`.
# Building the system from source
The OpenCCG build system is based on Apache Ant. Ant is a little but very handy tool that uses a build file written in XML (build.xml) as building instructions. Building the Java portion of OpenCCG is accomplished using the script `ccg-build'; this works under Windows and Unix, but requires that you run it from the top-level directory (where the build.xml file is located). If everything is right and all the required packages are visible, this action will generate a file called openccg.jar in the ./lib directory.
The OpenCCG build system is based on Apache Ant. Ant is a little but very handy tool that uses a build file written in XML (`build.xml`) as building instructions. Building the Java portion of OpenCCG is accomplished using the script `ccg-build`; this works under Windows and Unix, but requires that you run it from the top-level directory (where the `build.xml` file is located). If everything is right and all the required packages are visible, this action will generate a file called openccg.jar in the `./lib` directory.
Note that you should *not* build from source by invoking 'ant' directly. Instead, you should use ccg-build as shown below (Unix), after ensuring that you've set OPENCCG_HOME, JAVA_HOME and updated your PATH (the ccg-build script invokes ant with various parameters that aren't set properly if ant is invoked from the command line):
Note that you should *not* build from source by invoking 'ant' directly. Instead, you should use `ccg-build` as shown below (Unix), after ensuring that you've set `OPENCCG_HOME`, `JAVA_HOME` and updated your `PATH` (the `ccg-build` script invokes ant with various parameters that aren't set properly if ant is invoked from the command line):
```
> cd $OPENCCG_HOME
......@@ -126,9 +135,9 @@ Note that you should *not* build from source by invoking 'ant' directly. Instea
# Working with the Eclipse IDE
The Eclipse IDE can be used for editing the Java source code, though setup can be a bit tricky. The most reliable method seems to be as follows. First, follow the instructions above for building the source from the command line. Then, in Eclipse, choose File|New|Java Project to create a new Java Project, and give it a name, such as 'openccg'. Leave the default settings as they are, and click Next. Then choose Link Additional Source and browse to the folder 'src' in the directory where you installed OpenCCG (ie $OPENCCG_HOME/src). You'll need to give this location a new name, such as 'src2' ('src' is already taken by default). The final step is to Add External JARs under the Libraries tab. From OpenCCG's lib directory (ie $OPENCCG_HOME/lib), choose all of the .jar files. At this point, you should be able to hit Finish and the code should compile in Eclipse.
The Eclipse IDE can be used for editing the Java source code, though setup can be a bit tricky. The most reliable method seems to be as follows. First, follow the instructions above for building the source from the command line. Then, in Eclipse, choose File|New|Java Project to create a new Java Project, and give it a name, such as 'openccg'. Leave the default settings as they are, and click Next. Then choose Link Additional Source and browse to the folder `src/` in the directory where you installed OpenCCG (i.e. `$OPENCCG_HOME/src`). You'll need to give this location a new name, such as 'src2' ('src' is already taken by default). The final step is to Add External JARs under the Libraries tab. From OpenCCG's lib directory (i.e. `$OPENCCG_HOME/lib`), choose all of the `.jar` files. At this point, you should be able to hit Finish and the code should compile in Eclipse.
Note that with Eclipse's default settings, the code will compile in your Eclipse workspace, which is separate from your OpenCCG installation (this is a good thing, as Eclipse uses a 'bin' directory for compiled Java classes, whereas OpenCCG uses 'bin' for command-line scripts). Thus, once you have made a round of changes in Eclipse and are ready to try them out in OpenCCG, go back to the command line in $OPENCCG_HOME and invoke ccg-build to re-build the openccg.jar file. This will make your changes available in OpenCCG's programs, such as tccg.
Note that with Eclipse's default settings, the code will compile in your Eclipse workspace, which is separate from your OpenCCG installation (this is a good thing, as Eclipse uses a `bin/` directory for compiled Java classes, whereas OpenCCG uses `bin/` for command-line scripts). Thus, once you have made a round of changes in Eclipse and are ready to try them out in OpenCCG, go back to the command line in `$OPENCCG_HOME` and invoke `ccg-build` to re-build the `openccg.jar` file. This will make your changes available in OpenCCG's programs, such as `tccg`.
# Bug Reports
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment