Extracts HTML parts from a site
Version | 0.1 |
---|---|
Bundle | tools |
Categories | HTML |
Authors | Sirkku Karinen (sirkku.karinen@significo.fi) |
Issue tracker | View/Report issues |
Requires | Python ; python-lxml (DEB) |
Source files | component.xml extract.py |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
html1 | HTMLFile | Mandatory | HTML site to extract |
html2 | HTMLFile | Optional | HTML site to extract |
htmlArray | Array<HTMLFile> | Optional | Array of HTML sites to extract |
Name | Type | Description |
---|---|---|
head | HTMLFile | Head part |
body | HTMLFile | Body part |
script | JavaScript | JavaScript in external file FROM HEAD PART |
style | StyleSheet | CSS in external file FROM HEAD PART |
Name | Type | Default | Description |
---|---|---|---|
extractBody | string | "" | Element that should be extracted from body. If not set, extracts whole body part. Currently extracts only parts directly under body-tag. |
extractHead | string | "" | Element that should be extracted from head. If not set, extracts whole head part. Currently extracts only parts directly under head-tag. |
Test case | Parameters▼ | IN html1 |
IN html2 |
IN htmlArray |
OUT head |
OUT body |
OUT script |
OUT style |
---|---|---|---|---|---|---|---|---|
case1 | (missing) | html1 | (missing) | (missing) | (missing) | (missing) | script | style |
case2 | properties | html1 | (missing) | (missing) | (missing) | (missing) | script | style |
extractBody=h1, |