Discussion:
bug#32528: http-post breaks with XML response payload containing boundary
Ricardo Wurmus
2018-08-25 08:49:19 UTC
Permalink
Hi Guilers,

I’m having a problem with http-post and I think it might be a bug. I’m
talking to a Debbugs SOAP service over HTTP by sending (via POST) an XML
request. The Debbugs SOAP service responds with a string of XML.

Here’s a simplified version of what I do:

(use-module (web http))
(let ((req-xml "<soap:Envelope xmlns:soap...>"))
(receive (response body)
(http-post uri
#:body req-xml
#:headers
`((content-type . (text/xml))
(content-length . ,(string-length req-xml))))
;; Do something with the response body
(xml->sxml body #:trim-whitespace? #t)))

This fails for some requests with an error like this:

web/http.scm:1609:23: Bad Content-Type header: multipart/related; type="text/xml"; start="<main_envelope>"; boundary="=-=-="

Here’s a backtrace:

--8<---------------cut here---------------start------------->8---
In debbugs/soap.scm:
101:8 9 (soap-invoke "https://debbugs.gnu.org/cgi/soap.cgi" _ . _)
In web/client.scm:
386:24 8 (http-request _ #:body _ #:port _ #:method _ #:version _ #:keep-alive? _ #:headers _ #:decode-body? _ #:streaming? _ #:request _)
In web/response.scm:
200:48 7 (read-response #<input-output: string 2db5690>)
In web/http.scm:
225:33 6 (read-headers #<input-output: string 2db5690>)
195:11 5 (read-header #<input-output: string 2db5690>)
1606:12 4 (_ "multipart/related; type=\"text/xml\"; start=\"<main_envelope>\"; boundary=\"=-=-=\"")
In ice-9/boot-9.scm:
222:29 3 (map1 (" type=\"text/xml\"" " start=\"<main_envelope>\"" " boundary=\"=-=-=\""))
222:29 2 (map1 (" start=\"<main_envelope>\"" " boundary=\"=-=-=\""))
222:17 1 (map1 (" boundary=\"=-=-=\""))
In web/http.scm:
1609:23 0 (_ " boundary=\"=-=-=\"")
--8<---------------cut here---------------end--------------->8---

The reason why it fails is that Guile processes the response and treats
the *payload* contained in the XML response as HTTP. In this case it
processes the response and stumbles upon a multipart email that contains
a Content-type header specifying a boundary string.

The Content-type handler in (web http) doesn’t like that the boundary
string contains “=” and aborts.

The point is, though, that it shouldn’t even try to parse the payload of
the XML response. If you want to see the full XML response you can use
wget:

wget --post-data="<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:xsi=\"http://www.w3.org/1999/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/1999/XMLSchema\" xmlns:soapenc=\"http://schemas.xmlsoap.org/soap/encoding/\" soapenc:encodingStyle=\"http://schemas.xmlsoap.org/soap/encoding/\"><soap:Body><ns1:get_bug_log xmlns:ns1=\"urn:Debbugs/SOAP\" soapenc:encodingStyle=\"http://schemas.xmlsoap.org/soap/encoding/\"><ns1:bugnumber xsi:type=\"xsd:int\">32514</ns1:bugnumber></ns1:get_bug_log></soap:Body></soap:Envelope>" --header "Content-type: text/xml" -qO - "https://debbugs.gnu.org/cgi/soap.cgi"

Is this a problem with Guile when a response with Content-type text/xml
is received?

--
Ricardo
Mark H Weaver
2018-08-28 21:51:14 UTC
Permalink
Post by Ricardo Wurmus
I’m having a problem with http-post and I think it might be a bug. I’m
talking to a Debbugs SOAP service over HTTP by sending (via POST) an XML
request. The Debbugs SOAP service responds with a string of XML.
(use-module (web http))
(let ((req-xml "<soap:Envelope xmlns:soap...>"))
(receive (response body)
(http-post uri
#:body req-xml
#:headers
`((content-type . (text/xml))
(content-length . ,(string-length req-xml))))
;; Do something with the response body
(xml->sxml body #:trim-whitespace? #t)))
web/http.scm:1609:23: Bad Content-Type header: multipart/related; type="text/xml"; start="<main_envelope>"; boundary="=-=-="
[...]
Post by Ricardo Wurmus
The reason why it fails is that Guile processes the response and treats
the *payload* contained in the XML response as HTTP.
No, this was a good guess, but it's not actually the problem.

If you add --save-headers to the wget command line, you'll see the full
response, and the HTTP headers are what's being parsed, as it should be.
It looks like this (except that I removed the carriage returns below):

HTTP/1.1 200 OK
Date: Tue, 28 Aug 2018 21:40:30 GMT
Server: Apache
SOAPServer: SOAP::Lite/Perl/1.11
Strict-Transport-Security: max-age=63072000
Content-Length: 32650
X-Content-Type-Options: nosniff
X-Frame-Options: sameorigin
X-XSS-Protection: 1; mode=block
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: multipart/related; type="text/xml"; start="<main_envelope>"; boundary="=-=-="

<?xml [...]

The problem is simply that our Content-Type header parser is broken.
It's very simplistic and merely splits the string wherever ';' is found,
and then checks to make sure there's only one '=' in each parameter,
without taking into account that quoted strings in the parameters might
include those characters.

I'll work on a proper parser for Content-Type headers.

Thanks,
Mark
Mark H Weaver
2018-08-29 03:28:19 UTC
Permalink
Post by Mark H Weaver
I’m having a problem with http-post and I think it might be a bug. I’m
talking to a Debbugs SOAP service over HTTP by sending (via POST) an XML
request. The Debbugs SOAP service responds with a string of XML.
[...]
Post by Mark H Weaver
The problem is simply that our Content-Type header parser is broken.
It's very simplistic and merely splits the string wherever ';' is found,
and then checks to make sure there's only one '=' in each parameter,
without taking into account that quoted strings in the parameters might
include those characters.
I'll work on a proper parser for Content-Type headers.
I've attached preliminary patches to fix the Content-Type header parser,
and also to fix the parsing of response header lines to support
continuation lines.

With these patches applied, I'm able to fetch and decode the SOAP
response that you fetched with your 'wget' example, as follows:

--8<---------------cut here---------------start------------->8---
***@jojen ~/guile-stable-2.2 [env]$ meta/guile
GNU Guile 2.2.4.10-4c91d
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (use-modules (web http) (web uri) (web client) (sxml simple) (ice-9 receive))
scheme@(guile-user)> ,pp (let ((req-xml "<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:xsi=\"http://www.w3.org/1999/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/1999/XMLSchema\" xmlns:soapenc=\"http://schemas.xmlsoap.org/soap/encoding/\" soapenc:encodingStyle=\"http://schemas.xmlsoap.org/soap/encoding/\"><soap:Body><ns1:get_bug_log xmlns:ns1=\"urn:Debbugs/SOAP\" soapenc:encodingStyle=\"http://schemas.xmlsoap.org/soap/encoding/\"><ns1:bugnumber xsi:type=\"xsd:int\">32514</ns1:bugnumber></ns1:get_bug_log></soap:Body></soap:Envelope>"))
(receive (response body-port)
(http-post "https://debbugs.gnu.org/cgi/soap.cgi"
#:streaming? #t
#:body req-xml
#:headers
`((content-type . (text/xml))
(content-length . ,(string-length req-xml))))
(set-port-encoding! body-port "UTF-8")
(xml->sxml body-port #:trim-whitespace? #t)))
$1 = (*TOP* (*PI* xml "version=\"1.0\" encoding=\"UTF-8\"")
(http://schemas.xmlsoap.org/soap/envelope/:Envelope
(@ (http://schemas.xmlsoap.org/soap/envelope/:encodingStyle
"http://schemas.xmlsoap.org/soap/encoding/"))
(http://schemas.xmlsoap.org/soap/envelope/:Body
(urn:Debbugs/SOAP:get_bug_logResponse
(http://schemas.xmlsoap.org/soap/encoding/:Array
(@ (http://www.w3.org/1999/XMLSchema-instance:type
"soapenc:Array")
(http://schemas.xmlsoap.org/soap/encoding/:arrayType
"xsd:ur-type[4]"))
(urn:Debbugs/SOAP:item
(urn:Debbugs/SOAP:header
(@ (http://www.w3.org/1999/XMLSchema-instance:type
"xsd:string"))
"Received: (at submit) by debbugs.gnu.org; 23 Aug 2018 20:17:46 +0000\nFrom debbugs-submit-***@debbugs.gnu.org [...]
[...]
--8<---------------cut here---------------end--------------->8---

Note that I needed to make two other changes to your preliminary code,
namely:

* I passed "#:streaming? #t" to 'http-post', to ask for a port to read
the response body instead of reading it eagerly.

* I explicitly set the port encoding to "UTF-8" on that port before
using 'xml->sxml' to read it.

Otherwise, the entire 'body' response will be returned as a bytevector,
because the response Content-Type is not recognized as a textual type.
The HTTP Content-Type is "multipart/related", with a parameter:
type="text/xml". I'm not sure if we should be automatically
interpreting that as a textual type or not.

There's no 'charset' parameter in the Content-Type header, but the XML
internally specifies: encoding="UTF-8".

Anyway, here are the preliminary patches.

Mark
Ricardo Wurmus
2018-08-29 10:26:02 UTC
Permalink
Hi Mark,
[…]
Post by Mark H Weaver
Post by Ricardo Wurmus
The reason why it fails is that Guile processes the response and treats
the *payload* contained in the XML response as HTTP.
No, this was a good guess, but it's not actually the problem.
You are right. I also ended up trying with “wget --save-headers” after
Post by Mark H Weaver
Content-Type: multipart/related; type="text/xml"; start="<main_envelope>"; boundary="=-=-="
<?xml [...]
I assumed it was part of the payload when it really was a regular
header after all.
Post by Mark H Weaver
The problem is simply that our Content-Type header parser is broken.
It's very simplistic and merely splits the string wherever ';' is found,
and then checks to make sure there's only one '=' in each parameter,
without taking into account that quoted strings in the parameters might
include those characters.
Right. I worked around this in guile-debbugs simply by replacing the
Content-Type header parser with one that lacks the check for the unique
“=” in the string part.
Post by Mark H Weaver
I'll work on a proper parser for Content-Type headers.
Thanks!

--
Ricardo

Loading...