summaryrefslogtreecommitdiff
path: root/tools/eos-html-extractor
Commit message (Collapse)AuthorAge
* Handle HTML with embedded tagsPhilip Chimento2015-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When an element, such as <p>, has a name="translatable" attribute, we also want to grab markup tags inside the element and translate them as well. For example, previously this HTML: <p name="translatable">An embedded <b>tag</b> in a paragraph</p> would result in the following string being extracted: _("An embedded"); However, we want it to be: _("An embedded <b>tag</b> in a paragraph"); This removes the use of BeautifulSoup from the eos-html-extractor script. Unfortunately BeautifulSoup could have done this quite easily, but it does not provide any line number information, which we need. Previously in order to get the line numbers we also used html.parser from Python's standard library, to augment the data we got from BeautifulSoup. However, this issue required html.parser to do all the work that BeautifulSoup did anyway, so there is no reason to use BeautifulSoup anymore. [endlessm/eos-sdk#3291]
* Handle quotes in HTML stringsPhilip Chimento2015-06-18
| | | | | | | When generating the .dummy.c file, eos-html-extractor previously did not escape quotes correctly. [endlessm/eos-sdk#3291]
* Handle excess whitespace in stringsPhilip Chimento2015-06-18
| | | | | | | | | | | | Whitespace between words and tags doesn't matter to HTML. Indeed, the text in a translatable element may be formatted any way over any number of lines, so we normalize all consecutive whitespace to be just one space character and strip whitespace from the beginning and end of the strings. This is so that translators are not confronted with strange newlines and whitespace on Transifex. [endlessm/eos-sdk#3291]
* Ensure stdout is UTF-8Philip Chimento2015-06-05
| | | | | | | | | This gets the underlying byte stream of sys.stdout and wraps it in a UTF-8 encoder. That is then used as the default output file rather than sys.stdout itself, which on Jenkins may not have a default encoding of UTF-8. [endlessm/eos-sdk#3245]
* Set encoding on input and output filesPhilip Chimento2015-06-05
| | | | | | | | Since this is Python 3, we can specify at file-open time what the encoding of the input and output files is to be. This should fix any build errors with non-ASCII characters in an ASCII terminal environment. [endlessm/eos-sdk#3245]
* html-extractor: be more explicit about encoding conversionCosimo Cecchi2015-06-04
| | | | Try to fix a Jenkins test failure.
* Port eos-html-extractor to Python 3Philip Chimento2015-06-04
| | | | | | May as well be forward compatible, while we're touching this. [endlessm/eos-sdk#3245]
* Fix bug with no commentsPhilip Chimento2015-06-04
| | | | | | | eos-html-extractor would crash if it encountered some text comment before it had encountered a comment. [endlessm/eos-sdk#3245]
* Avoid global statePhilip Chimento2015-06-04
| | | | | | Another minor cleanup; TranslatableHTMLParser shouldn't use global state. [endlessm/eos-sdk#3245]
* Avoid unnecessary importsPhilip Chimento2015-06-04
| | | | | | | We don't need to import all of os, just os.path; and urllib was not necessary for reading the file. [endlessm/eos-sdk#3245]
* eos-html-extractor PEP8 and consistency cleanupPhilip Chimento2015-06-04
| | | | [endlessm/eos-sdk#3245]
* Port eos-html-extractor to use argparsePhilip Chimento2015-06-04
| | | | | | For more consistency in command line argument handling. [endlessm/eos-sdk#3245]
* Add eos-html-extractor and m4 filePhilip Chimento2015-06-04
This is taken almost directly from the existing version in eos-english. Cleanups to follow in subsequent commits. Previously the m4 code was in two separate macros, but since they were much the same, I combined them into one macro. This also adds a very minimal test for eos-html-extractor; basically as a very quick regression test for the cleanups to follow. [endlessm/eos-sdk#3245]