Discussion:
The <iframe> element and sandboxing ideas
Ian Hickson
2008-05-21 22:30:48 UTC
Permalink
Summary:

* I've added a sandbox="" attribute to <iframe>, which by default
disables a number of features and takes a space-separated list of
features to re-enable:

- by default, content in sandboxed browsing contexts, and any
browsing contexts nested in them, have a unique origin
(independent of the origin of their URI); this can be overriden
using the "allow-same-origin" keyword

- by default, all form controls in those browsing contexts are
disabled; this can be overriden using the "allow-forms"
keyword

- by default, script in those browsing contexts cannot run; this can
be overriden using the "allow-scripts" keyword

- content in those browsing contexts cannot navigate other
browsing contexts outside of the sandbox (seamless="" below
overrides this)

- content in those browsing contexts cannot create new browsing
contexts or open modal dialogs or alerts

- all plugins in those browsing contexts are disabled

* I've added a seamless="" boolean attribute to <iframe>, which, if
the content's active document's URI has the same origin as the
container, causes the iframe to size vertically to the bounding box
of the contents, and horizontally to the width of the container,
and which causes the initial containing block of the contents to be
treated as zero height. In addition, styles on the root element of
the content must inherit from the <iframe> instead of being the
initial values, and the style sheets that apply to the <iframe>
must also apply to the contents. In addition, any time the browsing
context navigates itself, the parent browsing context gets
navigated instead.

This is all HIGHLY EXPERIMENTAL. I am looking for feedback on the general
approaches taken.

There are various things that this doesn't address yet; e.g. there's no
way to force (or even allow) a non-seamless iframe to open links in the
parent window.
Let's imagine a blogging website that allows anybody to create a blog
which is available as http://www.example.com/blogs/username/. Many such
sites allow various user customization, so imagine this site lets the
blog owner to supply custom HTML to display on top of the blog page.
This is primarily used by blog authors to design stylish navigation. To
make such navigation menus more attractive, the authors wish to use
JavaScript and Flash, but unrestricted JavaScript would make it possible
for the blog owner to steal visitors' session cookies.
HTML to display on top of your blog: [TEXTAREA]
[SUBMIT]
Welcome to my blog!</sandbox><a href="#"
onclick="alert(document.cookie)">Click here</a>
After submission, this code is fed to the HTML cleaner. At present, HTML
cleaners are usually complicated scripts which try to catch known quirks
of the user agents, and still they usually have security holes found one
after another. See for example
http://cvs.livejournal.org/browse.cgi/livejournal/cgi-bin/cleanhtml.pl.
With HTML 5 parsing spec, there will be one single algorithm for parsing
HTML code with well-defined error recovery. So, the HTML cleaner at the
server side runs the HTML 5 parser on the user-supplied text, which
* Welcome to my blog!
* A
href="#"
onclick="alert(document.cookie)"
* Click here
The </sandbox> tag is ignored as an easy parse error because there is no
matching <sandbox> tag in the user-supplied text. After parsing, the
HTML cleaner iterates through the tree, renaming potentially unsafe
* Welcome to my blog!
* A
href="#"
safe-onclick="alert(document.cookie)"
* Click here
At the final stage, the HTML cleaner re-serializes the DOM into the
Welcome to my blog!<a href="#"
safe-onclick="alert(document.cookie)">Click here</a>
When the site renders the blog page, it puts the "HTML for page top"
<body>
<sandbox>
Welcome to my blog!<a href="#" safe-onclick="alert(document.cookie)">Click
here</a>
</sandbox>
...
</body>
Each blog entry is probably also contained in its own sandbox. This is
even more important on the so-called friends pages, where entries by
different authors are displayed on the same page.
When the page is rendered in a modern user agent which supports
sandboxing, the safe-onclick attribute is interpreted exactly the same
as onclick. When the user clicks the link, the event handler is
executed. Because the code is inside the sandbox, it operates on a fake
document object, so it doesn't retrieve the cookies (I think
document.cookie should just return an empty string). The visitor's
session cookies are safe.
When the page is rendered in an older user agent which doesn't support
sandboxing, the safe-onclick attribute is ignored because it is unknown.
When the user clicks the link, no event handler is executed, and the
cookies are safe again.
You can do this now (though it's far uglier) by taking the author's markup
and converting it to base64, and then stuffing it into an iframe something
like this:

<iframe seamless sandbox="allow-scripts allow-forms"
src="data:text/html;base64,PCFET0NUWVBFIEhUTUw%2BPHRpdGxlPjwvdGl0bGU%2BV2VsY29tZSB0byBteSBibG9nITwvc2FuZGJveD48YSBocmVmPSIjIiBvbmNsaWNrPSJhbGVydChkb2N1bWVudC5jb29raWUpIj5DbGljayBoZXJlPC9hPg0K">
</iframe>

This isn't very readable, I'll grant you. I'm thinking of introducing a
new attribute. I haven't worked out what to call it yet, but definitely
not "src", "source", "src2", "content", "value", or "data" -- maybe
"html" or "doc", though neither of those are great. This attribute would
take a string which would then be interpreted as the source document
markup of an HTML document, much like the above; it would override src=""
if it was present, allowing src="" to be used for legacy UAs:

<iframe seamless sandbox="allow-scripts allow-forms" doc="
<!DOCTYPE HTML>
<title></title>
Welcome to my blog!
</sandbox>
<a href='#' onclick='alert(document.cookie)'>Click here</a>
"></iframe>

(There are things we can do to make this better, e.g. make the <!DOCTYPE
HMTL> and <title></title> bits implicit, maybe introducing type="" to say
whether it's HTML or XML instead of only supporting HTML, maybe saying
that if src="" and doc="" are both specified they must have identical
data, etc.)

Comments and suggestions on this are welcome. I haven't added it to the
spec yet. I do agree that without this or something equivalent that we
don't have a solution for sandboxing embedded blog comments yet.
The idea is basically an element like <iframe> but that renders the
linked page, instead of inside a square area, in flow with the main
page. This idea is really rough still, but I hope to try to implement it
in a not too distant future to solidify it a bit. One thing very much up
in the air is what the element would be called. Suggestions welcome, but
I'm using the name <include> below.
I've basically added this to <iframe> using the seamless="" attribute.
Should the stylesheets of the outer or the inner document be used?
I went with "yes".
When a fragment identifier is specified, should we render that element,
or its children?
I went with making that work the same as with normal <iframe>s (so likely
no effect if the default shrink-wrapping-to-boundary-box behaviour is in
effect).
Should style be inherited from the parent of the <include>, or from the
DOM parent in the inner document?
I've made inheritance happen from <iframe> to root element.
Should the inner DOM be rendered inside of, or in place of the <include>?
I've made this happen as with <iframe>.
https://bugzilla.mozilla.org/show_bug.cgi?id=80713
I've taken the notes there into account.
There's a big difference to that and to what I'm proposing. With what's
in bug 80713 you're still limited to a box that basically doesn't take
part of the outer page at all. For example in the table example in my
original post the headers of the table would not resize to fit the
column sizes in the <include>ed table.
Woah. That's far more radical. I have no idea how to do that. How would
you make the parser not generate the implied elements and switch straight
to the "in table" mode? How would you make the CSS model work with this?
How would you define conformance for the document fragments?
Would documents included via <include> run in the security context of
the including page, as with the script technique, or would they run in
the context of the included document, as with iframes?
The sandbox="" attribute can be specified to change it from the former to
the latter (and in fact, from the former to an isolated origin regardless
of the true origin of the document).
They would run in the context of the included page, just like an iframe.
The processing of <include> is exactly that of <iframe> the only
difference is in the rendering.
It may be worth bringing this up with the CSSWG if it really is just a
rendering issue.
XBL has an attribute to cover inherited styles, so you're right.
Realistically, I can't see Microsoft ever implementing XBL (I hope I'm
wrong). So adding it to HTML might be the only way to achieve this
functionality.
Inventing a new technology that does the same as another on the basis that
the UAs will implement one but not the other seems dubious at best.
Kind of like an <iframe> but without an external source.
Would the doc="" proposal above be enough?
I wonder if this issue could be solved on the layout/CSS level by
providing a way to make the height of an iframe depend on the actual
height of the root element of the document loaded in the iframe. That
is, would it be feasible to make the iframe contents have the layout/UI
feel of a part of the parent page while keeping the DOMs and script
security contexts separate?
That's pretty much what seamless="" does, yes.
http://www.w3.org/TR/css3-box/#intrinsic0 (and also CSS2 10.6)
Since CSS doesn't attempt to specify the intrinsic width of a document in an
iframe, maybe HTML5 should specify that the intrinsic width of a document
- if the CSS width property is specified on the html element, the margin-box
of the page at that width (which may have overflow)
- else, if the CSS min-width property is specified on the html element, the
margin-box of the page at that width (which may have overflow)
- else, the smallest width the page can have without horizontal scrolling
- if the CSS height or min-height property are set, similar to above,
- else, the smallest height the page can have at the intrinsic width of the
document without vertical scrolling
That seems overly complicated, but the spec says something similar in
fewer words.
I see you have done some work to prevent reflow loops with percentage
root heights > 100%, but how does your patch handle an iframe document
that looks like this? (I can think of nastier testcases also, where
"bottom" is embedded further down in the document)
<html>
<head>
</head>
<body>
<div style="position:absolute;bottom:-5px;">This will force a scrollbar on the
document</div>
</body>
</html>
As far as I can tell, the spec handles this fine.
What about encoding the content of each comment iframe in a "data:" URI?
That unfortunately isn't compatible with IE, and has rather unfortunate
non-trivial escaping requirements.
The contents of an iframe with a data: URI source should be trusted,
unlike an iframe with an http: URI source from another domain. A script
in an iframe with a data: URI source should, by default, be able to
communicate with the parent window. So, that alone doesn't solve the
problem.
Adding sandbox="" solves this (at least for new UAs).
Not to mention that data: URIs are ugly, wasteful (because of the BASE64
encoding), cannot be read and written by humans directly, and have
maximum length problems in some implementations.
Right.
Yes, I want the sandbox to degrade securely, as does any webmaster who
might be going to allow some user-supplied scripting while relying on
sandboxing for security. To cover its use cases, this feature must
degrade securely.
Degrade securely _and usefully_, or just securely (and maybe to nothing)?

The latter is handled by the doc="" proposal. The former may be impossible
without server-side filtering.
This does degrade securely, doesn't require separate HTTP requests, and
maintains human readability.
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2005-December/005301.html
This still requires server-side filtering, though.
People are already struggling to remove all scripts from HTML snippets.
I don't think finding all these occurrence and replacing them is going
to be much better. Also, you'd need safe-style="" and <safe-style> too,
since IE can embed javascript expressions into style rules. (And now
lets hope IE does not allow expression elsewhere.)
Indeed.
This principle could be transposed to <sandbox>, where it could be
defined as taking the unsafe HTML content from an attribute. And the
best part: you don't need anything else like the safe-* substitutions as
<sandbox type="text/html" content="
document.write(window.parent.location)
">
Alternative, possibly degraded but safe content for older browsers.
</sandbox>
I think we'd want to use <iframe> for this, but otherwise, yes.
Would you really want separate security contexts for each comment?
I wouldn't want to allow people screw up others' comments, making it
look that other users wrote what they didn't write. So, yes, it's
important that any code within a comment cannot change anything but
itself. This also means that the comment should be unable to change the
header/footer around it to pretend that someone else wrote it.
Documents per comment are expensive, but they do seem to be what we need
(or maybe want) here.
The OP probably meant that maintaining so many contexts would cause a
comparable deterioration in performance. All user comments should be
put in one security context.
With all comments grouped together in such a manner, you could even use
an inline frame.
While simple, this wouldn't let you do things like have trusted content
interleaved with comments (e.g. "edit" and "reply" links), which is
common.
I really think comments are a bad use case. Why would someone allow
scripts in comments in any context, much less a sandboxed one?
You wouldn't, but you would want to prevent scripts from running
altogether.
The best use case I have thought of so far is MySpace et. al., a site
where users have their own page with limited permission in the context
of the overall site. MySpace solves this by not allowing scripts at
all, as most such web sites do. If possible, such sites might allow a
user to insert widget scripts with limited permissions. For this use
case, iframe isn't ideal, either, but limited scripting and styling are
desired.
Would the spec's current proposals work?
This probably depends on the use cases in question. For some use
cases, the status quo is in fact the script running with full
privileges, so while not being ideal, it is indeed acceptable; in
other cases, you wouldn't want scripts to run at all if they weren't
limited in some way.
A security feature, by definition, protects the users from a certain
class of attacks. An attack needs to be only successful in one browser
to do harm. For example, a malicious advertising script which actually
steals passwords entered by users on the host page is dangerous enough
even if the attacker only succeeds in stealing passwords of just a
fraction of the users.
I can't really imagine a scenario in which sandbox restrictions could be
somehow considered optional. Wherever there is need for such
restrictions, it's unacceptable to run the script without them
implemented.
In some cases the sandbox would be "defence in depth" -- for example, in
all cases where user-generated content is embedded today.
1. Doesn't require loading of a separate document via a separate HTTP
request, and without the ugliness of data: URIs. If there was some
"inline" version of <iframe>, such as <iframe>content</iframe>, that
would be just fine.
doc="" would handle this, then...
2. Implements the security barrier even though the inner content doesn't
come from a different domain. <iframe> would require a separate domain
for that.
sandbox="" does this now.
3. The security barrier is asymmetric, i.e. the outer scripts have
access to the inner content, but not the other way round.
What's the use case for this?
All attempts to treat user-submitted HTML as a string are doomed to
having such vulnerabilities. <sandbox> alone doesn't add much to this
problem. Just look at how complex is the HTML sanitizer in LiveJournal
which allows some user-submitted markup but not all.
That's one advantage of the doc="" idea; it makes sanitising mostly
trivial compared to all other ideas for this.
<sandbox secret="09f9...">Hello World</sandbox secret="09F9...">
In other words, make them match. So any inserted </sandbox> tags
wouldn't close the sandbox unless they knew the secret - which they
couldn't do, because they have the chicken-and-egg problem of having to
be able to read the page first.
This relies on the author being able to reliably produce unpredictable
content, which is a very dubious responsibility to put on many authors.

Also, it would make the XML guys have a fit. Then again, maybe that goes
in the "pro" column and not the "con" column...
http://www.gerv.net/security/content-restrictions/ , specifically the
"hierarchy" restriction, allows the <iframe> content to be isolated from
the parent.
It's not enirely clear what the proposal here is; as far as I can tell
it's an HTTP header. Is that right? Self-describing the security
restrictions on content works for same-site serving, but not really for
third-party content.
IE has the proprietary "security" attribute on <iframe> which restricts
http://msdn2.microsoft.com/en-us/library/ms534622.aspx
I tried using this, but it was tied too closely to IE's own security
concepts to really make use of it, sadly.
Documents don't have intrinsic dimensions, and the user's default font
size is likely to vary from user to usr. How would you know what
height and width to give?
You give it the dimensions of an industry-standard ad banner size.
The same way you would know what height and width to give to a
non-replaced element. Why should an embedded document not be able to
render as if the contents of the document were present inline in the
parent document? Backwards compatibility should probably trump better
behavior here, but why is it not possible to specify this through CSS?
I've heard of this problem multiple times. For example,
http://weblogs.mozillazine.org/gerv/archives/2005/02/autosizing_ifra.html
I've added height/width back.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Andrew Fedoniouk
2008-05-21 23:45:32 UTC
Permalink
Post by Ian Hickson
* I've added a sandbox="" attribute to <iframe>, which by default
disables a number of features and takes a space-separated list of
...

Makes sense, Ian.

Additionally to this, what about adding <meta> tag that disables or
limits features of the page if it is running inside <frame> or <iframe>?

Say something like this:

<html>
<head>
<meta name="allowed-context" value="standalone-only" />
</head>
...
</html>

That may prevent some types of malicious uses.
--
Andrew Fedoniouk.

http://terrainformatica.com
Martin Atkins
2008-05-22 12:20:49 UTC
Permalink
Post by Ian Hickson
* I've added a sandbox="" attribute to <iframe>, which by default
disables a number of features and takes a space-separated list of
[snip list]

Unless I'm missing something, this attribute is useless in practice
because legacy browsers will not impose the restrictions. This means
that as long as legacy browsers exist (i.e. forever) server-side
filtering must still be employed to duplicate the effects of the sandbox.

One alternative would be to use a different element name so that
fallback content can be provided for legacy browsers. In the short term,
this is likely to be something like this:

<sandbox src="/comments/blah">
<iframe src="/comments/blah?do-security-filtering=1"></iframe>
</sandbox>

Once a large percentage of browsers support <sandbox> authors can start
to be less accommodating with their fallback content, either by
filtering out HTML tags entirely (which I'd assume is easier than just
filtering out script) or at the extreme just setting the fallback
content to be "Your browser is not supported".

This comment does not address "seamless", which seems to be orthogonal
and can thus be equally applied to both sandbox and iframe as currently
specified.
Kristof Zelechovski
2008-05-22 15:13:57 UTC
Permalink
Legacy browsers will use @SRC which must be filtered. They will ignore the
new content (whatever the attribute name will be) altogether so it need not
be filtered. Fallback @SRC can contain a URL to an error page saying "Sorry,
not in your browser".
Chris

-----Original Message-----
From: whatwg-bounces-***@public.gmane.org
[mailto:whatwg-bounces-***@public.gmane.org] On Behalf Of Martin Atkins
Sent: Thursday, May 22, 2008 2:21 PM
To: Ian Hickson
Cc: public-webapi-***@public.gmane.org; whatwg; HTMLWG
Subject: Re: [whatwg] The <iframe> element and sandboxing ideas
Post by Ian Hickson
* I've added a sandbox="" attribute to <iframe>, which by default
disables a number of features and takes a space-separated list of
[snip list]

Unless I'm missing something, this attribute is useless in practice
because legacy browsers will not impose the restrictions. This means
that as long as legacy browsers exist (i.e. forever) server-side
filtering must still be employed to duplicate the effects of the sandbox.
Jon Ferraiolo
2008-05-22 14:58:06 UTC
Permalink
FYI - We have had some discussion in and around the topic of better iframes
at OpenAjax Alliance:

http://www.openajax.org/runtime/wiki/Better_IFrames_Better_Sandboxing

However, people see shortcomings with all proposals listed on that page.
Our hope was that the HTML5 leaders would figure out a good approach, so I
am glad to see that Ian has started discussion on this topic.

Regarding Martin's comments, I think it is a correct objective to find a
bridge between what exists with today's browsers and what we hope will
exist in future browsers. The Ajax community usually needs to get the
desired result in both legacy browsers and new browsers.

If you need to sandbox in today's browsers, what the community tends to use
one of two approaches: (1) require that sandboxed components be expressed
in a restricted subset of HTML and/or JavaScript, such as Caja or ADSafe or
the markup restrictions for portlets, (2) place the sandboxed components
into an IFRAME whose domain or subdomain differs from everything else on
the page (ie, leveraging the browser same-domain policy to achieve
sandboxing). The problem with approach #1 is that some functionality
(potentially critical) is disabled and developers have to in effect learn a
new language. The problem with approach #2 is that isolation is an
all-or-nothing proposition and there are shortcomings with IFRAMEs, such as
lack of automatic content sizing. Ian's proposal below addresses these
IFRAME shortcomings directly, which is great.

If I had time to think extensively about this issue (which I don't), I
would attempt to craft a proposal that used an approach where an Ajax
library performed the mapping between what exists today (i.e., IFRAME) and
what would exist in the future, where Ajax libraries could be eliminated
once older browsers were put out of commission. My initial thought would be
to put a 'sandbox' attribute on a DIV rather than on an IFRAME. That way,
you end up with more powerful sandboxing, along the lines of what Doug
Crockford proposed with his <module> tag. Newer browsers would deliver the
sandboxing features that Ian is proposing below. For older browsers,
someone could author an Ajax library that looked for DIV elements with a
'sandbox' attribute, and under the hood transformed the DOM such that it
achieved sandboxing via IFRAMEs and implements the flexibility that Ian
describes in his proposal via typical ugly Ajax hacks, such as passing
messages via postMessage (or the even uglier fragment identifer approach).

Jon







Martin Atkins
<***@degeneratio
n.co.uk> To
Sent by: Ian Hickson <***@hixie.ch>
public-webapi-req cc
***@w3.org whatwg <***@whatwg.org>, HTMLWG
<public-***@w3.org>,
public-***@w3.org
05/22/08 05:20 AM Subject
Re: The <iframe> element and
sandboxing ideas
Post by Ian Hickson
* I've added a sandbox="" attribute to <iframe>, which by default
disables a number of features and takes a space-separated list of
[snip list]

Unless I'm missing something, this attribute is useless in practice
because legacy browsers will not impose the restrictions. This means
that as long as legacy browsers exist (i.e. forever) server-side
filtering must still be employed to duplicate the effects of the sandbox.

One alternative would be to use a different element name so that
fallback content can be provided for legacy browsers. In the short term,
this is likely to be something like this:

<sandbox src="/comments/blah">
<iframe src="/comments/blah?do-security-filtering=1"></iframe>
</sandbox>

Once a large percentage of browsers support <sandbox> authors can start
to be less accommodating with their fallback content, either by
filtering out HTML tags entirely (which I'd assume is easier than just
filtering out script) or at the extreme just setting the fallback
content to be "Your browser is not supported".

This comment does not address "seamless", which seems to be orthogonal
and can thus be equally applied to both sandbox and iframe as currently
specified.
Jon Ferraiolo
2008-05-25 19:02:48 UTC
Permalink
Further comments after attending a talk at an IEEE security workshop (where
Ian's proposal was presented to various security experts):

1) I take back my suggestion about considering <div sandbox="..."> versus
Ian's original <iframe sandbox="..." />. Ian's original approach, although
more restrictive, does start off from a foundation based on security
concerns and then attempts to find ways to loosen them. The problem with
<div> is that if the content is not well-formed and inserts an extra
</div>, then the content after the </div> would not be sandboxed.

2) Olaf suggested that there might be another attribute to propagate
events. This is definitely highly desirable in some scenarios. Note that
the CDF WG has done some work that relates at least partially, although I
wouldn't be surprised if Ian isn't all that positive on CDF. Nevertheless,
here is the spec: http://www.w3.org/TR/WICD/. The WICD spec focuses on
various aspects of not just event propagation, but also hyperlink
propagation and focus management. All of these topics seem worthy of
consideration in terms of bridging between the host web page and any of the
iframes it embeds.

3) It seems to me that for some of the propagation areas (e.g., CSS
propagation, event propagation, focus-model integration) you would want
both the container and the component to opt-in before the propagation
occurred. For example, with CSS propagation, there may be cases where the
component only wants certain of its own characteristics to be stylable by
the parent. If you look at typical Ajax widgets, which use CSS for
controlling the visual rendering, there are some aspects which are meant to
be stylable by the developer, some aspects that are meant to be "themable"
(i.e., stylable via a shared theme), and other aspects which the widget
needs to control exclusively and should not be overridden. I would assume
that there are also security issues with allowing the parent to override
the styling of an embedded iframe because conceivably someone could invoke
a bank website within an iframe and it wouldn't be good if the parent could
override some of the CSS for the bank's website. Similarly, you probably
wouldn't want the parent frame to be able to listen to keystrokes that
happen within the child iframe (e.g., your password). For some of the
information passing between parent and child, it might be best to somehow
use a publish/subscribe mechanism like how postMessage() works, where both
both parent and child have to opt-in before the propagation occurs.

Jon
Post by Jon Ferraiolo
FYI - We have had some discussion in and around the topic of better
http://www.openajax.org/runtime/wiki/Better_IFrames_Better_Sandboxing
However, people see shortcomings with all proposals listed on that
page. Our hope was that the HTML5 leaders would figure out a good
approach, so I am glad to see that Ian has started discussion on this
topic.
Post by Jon Ferraiolo
Regarding Martin's comments, I think it is a correct objective to
find a bridge between what exists with today's browsers and what we
hope will exist in future browsers. The Ajax community usually needs
to get the desired result in both legacy browsers and new browsers.
If you need to sandbox in today's browsers, what the community tends
to use one of two approaches: (1) require that sandboxed components
be expressed in a restricted subset of HTML and/or JavaScript, such
as Caja or ADSafe or the markup restrictions for portlets, (2) place
the sandboxed components into an IFRAME whose domain or subdomain
differs from everything else on the page (ie, leveraging the browser
same-domain policy to achieve sandboxing). The problem with approach
#1 is that some functionality (potentially critical) is disabled and
developers have to in effect learn a new language. The problem with
approach #2 is that isolation is an all-or-nothing proposition and
there are shortcomings with IFRAMEs, such as lack of automatic
content sizing. Ian's proposal below addresses these IFRAME
shortcomings directly, which is great.
If I had time to think extensively about this issue (which I don't),
I would attempt to craft a proposal that used an approach where an
Ajax library performed the mapping between what exists today (i.e.,
IFRAME) and what would exist in the future, where Ajax libraries
could be eliminated once older browsers were put out of commission.
My initial thought would be to put a 'sandbox' attribute on a DIV
rather than on an IFRAME. That way, you end up with more powerful
sandboxing, along the lines of what Doug Crockford proposed with his
<module> tag. Newer browsers would deliver the sandboxing features
that Ian is proposing below. For older browsers, someone could
author an Ajax library that looked for DIV elements with a 'sandbox'
attribute, and under the hood transformed the DOM such that it
achieved sandboxing via IFRAMEs and implements the flexibility that
Ian describes in his proposal via typical ugly Ajax hacks, such as
passing messages via postMessage (or the even uglier fragment
identifer approach).
Jon
05/22/08 05:20 AM
[image removed]
To
[image removed]
[image removed]
cc
[image removed]
[image removed]
Subject
[image removed]
Re: The <iframe> element and sandboxing ideas
[image removed]
[image removed]
Post by Ian Hickson
* I've added a sandbox="" attribute to <iframe>, which by default
disables a number of features and takes a space-separated list of
[snip list]
Unless I'm missing something, this attribute is useless in practice
because legacy browsers will not impose the restrictions. This means
that as long as legacy browsers exist (i.e. forever) server-side
filtering must still be employed to duplicate the effects of the sandbox.
One alternative would be to use a different element name so that
fallback content can be provided for legacy browsers. In the short term,
<sandbox src="/comments/blah">
<iframe src="/comments/blah?do-security-filtering=1"></iframe>
</sandbox>
Once a large percentage of browsers support <sandbox> authors can start
to be less accommodating with their fallback content, either by
filtering out HTML tags entirely (which I'd assume is easier than just
filtering out script) or at the extreme just setting the fallback
content to be "Your browser is not supported".
This comment does not address "seamless", which seems to be orthogonal
and can thus be equally applied to both sandbox and iframe as currently
specified.
[image removed]
Collin Jackson
2008-05-27 01:02:05 UTC
Permalink
I would assume that there are also
security issues with allowing the parent to override the styling of an
embedded iframe because conceivably someone could invoke a bank website
within an iframe and it wouldn't be good if the parent could override some
of the CSS for the bank's website. Similarly, you probably wouldn't want the
parent frame to be able to listen to keystrokes that happen within the child
iframe (e.g., your password).
Since the parent can already overlay password fields on top of the
sandboxed frame or replace it with a spoofed version, I don't think we
should encourage widgets to solicit passwords inside their sandboxed
frame if they don't trust their parent.

Collin Jackson
Boris Zbarsky
2008-05-22 16:27:00 UTC
Permalink
Post by Ian Hickson
- by default, content in sandboxed browsing contexts, and any
browsing contexts nested in them
- content in those browsing contexts cannot create new browsing
contexts or open modal dialogs or alerts
?
Post by Ian Hickson
have a unique origin
(independent of the origin of their URI); this can be overriden
using the "allow-same-origin" keyword
So the parent page cannot script the contents of the iframe by default, right?
Post by Ian Hickson
- by default, script in those browsing contexts cannot run; this can
be overriden using the "allow-scripts" keyword
What happens if the parent page sets window.location to a javascript: URI on the
sandbox iframe? Does the script run? If so, in which browsing context?
Post by Ian Hickson
causes the iframe to size vertically to the bounding box
of the contents, and horizontally to the width of the container
I assume that the bounding box is computed after setting the width?

By "the width of the container" do you mean that the iframe computed width
should be equal to its containing block's computed width? Or that the
display:block non-replaced width algorithm from CSS should be used?
Post by Ian Hickson
and which causes the initial containing block of the contents to be
treated as zero height.
So percentage heights would end up being 0, while the iframe would be whatever
height is needed if one assumes they're auto?
Post by Ian Hickson
and the style sheets that apply to the <iframe>
must also apply to the contents.
But the ' ' and '>' combinators don't cross the iframe boundary, right?
Post by Ian Hickson
This is all HIGHLY EXPERIMENTAL. I am looking for feedback on the general
approaches taken.
As someone else pointed out, this doesn't seem like it would be usable without
some UA sniffing or something, as things stand.
Post by Ian Hickson
There are various things that this doesn't address yet; e.g. there's no
way to force (or even allow) a non-seamless iframe to open links in the
parent window.
This attribute would
take a string which would then be interpreted as the source document
markup of an HTML document, much like the above
This seems very prone to security issues (injection of the closing quote in the
content) to me... The base64 approach is nice in that you can't shoot yourself
in the foot with it.

-Boris
Kristof Zelechovski
2008-05-22 17:19:23 UTC
Permalink
1. Nested browsing contexts in a sandboxed frame cannot be created
dynamically but they can be defined by the inner markup.
2. If the frame is not allowed to execute scripts, setting location to
script should have no effect.
3. There is a potential discrepancy between applying parent width, which is
characteristic to block-level elements, and the declared element level in
that the level of a frame depends on an attribute. This is unprecedented:
the elements in HTML have a fixed level by design. Introducing a new
element should be reconsidered in view of that IMHO.
4. Percentage in height scales to the container's height, not to the initial
dimensions of the current element. It is an error if the container's height
is left implicit or if the sum of percentages exceeds 100%.
5. The argument against SANDBOX is that the user could inject /SANDBOX. The
argument against code attribute is that the user could inject a quote.
Aren't these similar enough to reconsider SANDBOX?
It seems it is easier to sanitize quotes because the burden of quoting is on
the user.
Compare '<SANDBOX ><SANDBOX ></SANDBOX ></SANDBOX >' to '<SPAN
TITLE="&quot;" ></SPAN >' that must be converted to '"<SPAN
TITLE=&quot;&amp;quot;&quot; ></SPAN >'. The quoting required seems
straightforward. I agree that using a data URL is simpler and cannot be
viewed as an obstacle to productivity since the author's text must be
processed anyway, so why not just encode it? And it is more consistent with
contemporary technology.
HTH,
Chris

-----Original Message-----
From: whatwg-bounces-***@public.gmane.org
[mailto:whatwg-bounces-***@public.gmane.org] On Behalf Of Boris Zbarsky
Sent: Thursday, May 22, 2008 6:27 PM
To: Ian Hickson
Cc: public-webapi-***@public.gmane.org; whatwg; HTMLWG
Subject: Re: [whatwg] The <iframe> element and sandboxing ideas
Post by Ian Hickson
- by default, content in sandboxed browsing contexts, and any
browsing contexts nested in them
- content in those browsing contexts cannot create new browsing
contexts or open modal dialogs or alerts
?
Post by Ian Hickson
have a unique origin
(independent of the origin of their URI); this can be overriden
using the "allow-same-origin" keyword
So the parent page cannot script the contents of the iframe by default,
right?
Post by Ian Hickson
- by default, script in those browsing contexts cannot run; this can
be overriden using the "allow-scripts" keyword
What happens if the parent page sets window.location to a javascript: URI on
the
sandbox iframe? Does the script run? If so, in which browsing context?
Post by Ian Hickson
causes the iframe to size vertically to the bounding box
of the contents, and horizontally to the width of the container
I assume that the bounding box is computed after setting the width?

By "the width of the container" do you mean that the iframe computed width
should be equal to its containing block's computed width? Or that the
display:block non-replaced width algorithm from CSS should be used?
Post by Ian Hickson
and which causes the initial containing block of the contents to be
treated as zero height.
So percentage heights would end up being 0, while the iframe would be
whatever
height is needed if one assumes they're auto?
Post by Ian Hickson
and the style sheets that apply to the <iframe>
must also apply to the contents.
But the ' ' and '>' combinators don't cross the iframe boundary, right?
Post by Ian Hickson
This is all HIGHLY EXPERIMENTAL. I am looking for feedback on the general
approaches taken.
As someone else pointed out, this doesn't seem like it would be usable
without
some UA sniffing or something, as things stand.
Post by Ian Hickson
There are various things that this doesn't address yet; e.g. there's no
way to force (or even allow) a non-seamless iframe to open links in the
parent window.
This attribute would
take a string which would then be interpreted as the source document
markup of an HTML document, much like the above
This seems very prone to security issues (injection of the closing quote in
the
content) to me... The base64 approach is nice in that you can't shoot
yourself
in the foot with it.

-Boris
Boris Zbarsky
2008-05-23 03:19:12 UTC
Permalink
Post by Kristof Zelechovski
1. Nested browsing contexts in a sandboxed frame cannot be created
dynamically but they can be defined by the inner markup.
There was no mention of "dynamically" in Ian's proposal. My assumption
was that "cannot create browsing contexts" meant just that. If it
doesn't, the wording needs some changes.
Post by Kristof Zelechovski
2. If the frame is not allowed to execute scripts, setting location to
script should have no effect.
OK. Again, that was not clear in the original proposal.
Post by Kristof Zelechovski
4. Percentage in height scales to the container's height, not to the initial
dimensions of the current element. It is an error if the container's height
is left implicit
It's not an error in CSS. Or are you suggesting a different algorithm?
Post by Kristof Zelechovski
or if the sum of percentages exceeds 100%.
Again, not a problem in CSS. Percentages of auto just get treated as
auto. If you're suggesting a totally different algorithm, it needs a
lot of fleshing out.
Post by Kristof Zelechovski
5. The argument against SANDBOX is that the user could inject /SANDBOX. The
argument against code attribute is that the user could inject a quote.
Aren't these similar enough to reconsider SANDBOX?
SANDBOX and the non-base64 attribute thing seem pretty similar in a lot
of ways to me, except that the iframe (having a separate Window and
such) might be easier to secure in existing implementations.

-Boris
Ojan Vafai
2008-05-24 17:55:37 UTC
Permalink
Post by Ian Hickson
* I've added a seamless="" boolean attribute to <iframe>, which, if
the content's active document's URI has the same origin as the
container, causes the iframe to size vertically to the bounding box
of the contents, and horizontally to the width of the container,
and which causes the initial containing block of the contents to be
treated as zero height. In addition, styles on the root element of
the content must inherit from the <iframe> instead of being the
initial values, and the style sheets that apply to the <iframe>
must also apply to the contents. In addition, any time the browsing
context navigates itself, the parent browsing context gets
navigated instead.
This looks awesome.

So, the whole point of these is defining elements that are isolated from
their surrounding context on different axes. Same origin iframes currently
just give you CSS isolation. sandbox affords script isolation. seamless
affords the ability to turn off the CSS isolation.

Seems to me that we need a third property which controls event isolation.
Currently events don't propagate in/out of iframes and event coordinates are
all relative to the iframe's viewport (e.g. on mouse events).

My first intuition was that seamless should also just propagate events and
have mouse coordinate be relative to the parent browsing context. But I can
think of cases where you would want to control the two separately. For
example, if you are especially concerned about performance and don't want
events in the parent browsing context to be handled by the iframe's
contents.

Ojan
Ojan Vafai
2008-05-26 03:44:56 UTC
Permalink
A couple more thoughts.

1. When seamless is set, the compatMode of the iframe should be the same as
that of the parent browsing context, even if the doctype of the iframe would
put it in a different compatmode than its parent. 2. The corrollary to this
is that when seamless is not set that the compatMode of iframes created from
JS should be backcompat unless a doctype is document.write'ed in. This is
what every browser does currently AFAIK.
3. The behavior of seamless in the face of different overflow values needs
to be spec'ed as well. I think the current spec deals well with
overflow:visible. But if overflow is scroll or auto, then it should behave
the way a div does with overflow scroll or auto (i.e. not size it's height
to its contents).

Ojan
Post by Ojan Vafai
Post by Ian Hickson
* I've added a seamless="" boolean attribute to <iframe>, which, if
the content's active document's URI has the same origin as the
container, causes the iframe to size vertically to the bounding box
of the contents, and horizontally to the width of the container,
and which causes the initial containing block of the contents to be
treated as zero height. In addition, styles on the root element of
the content must inherit from the <iframe> instead of being the
initial values, and the style sheets that apply to the <iframe>
must also apply to the contents. In addition, any time the browsing
context navigates itself, the parent browsing context gets
navigated instead.
This looks awesome.
So, the whole point of these is defining elements that are isolated from
their surrounding context on different axes. Same origin iframes currently
just give you CSS isolation. sandbox affords script isolation. seamless
affords the ability to turn off the CSS isolation.
Seems to me that we need a third property which controls event isolation.
Currently events don't propagate in/out of iframes and event coordinates are
all relative to the iframe's viewport (e.g. on mouse events).
My first intuition was that seamless should also just propagate events and
have mouse coordinate be relative to the parent browsing context. But I can
think of cases where you would want to control the two separately. For
example, if you are especially concerned about performance and don't want
events in the parent browsing context to be handled by the iframe's
contents.
Ojan
Ojan Vafai
2008-05-26 18:59:01 UTC
Permalink
Revising a comment I made yesterday. A couple other things came to mind.
What happens if an iframe is loaded with sandbox set and then the property
it is unset? What security origin is it in? Similiarly, what happens when
seamless is set/removed on an iframe already in the page? Does it start
inheriting CSS and resize to fit it's content? I don't feel strongly about
what should happen in these cases, seems worth being explicit though.
Post by Ojan Vafai
1. When seamless is set, the compatMode of the iframe should be the same as
that of the parent browsing context, even if the doctype of the iframe would
put it in a different compatmode than its parent.
I thought about this some more and this seems like a bad idea. If you
actualy link to a page that expects to be quirks from a standards parent,
then this could be break things. I'll modify this to the following:

Iframes with an empty src (or no src property) should inherit their parent's
compatmode iff seamless is set, otherwise they should be in backcompat
unless a standards doctype is document.write'ed in.

Again the latter part of that is for compatibility with current browsers.
Post by Ojan Vafai
2. The corrollary to this is that when seamless is not set that the
compatMode of iframes created from JS should be backcompat unless a doctype
is document.write'ed in. This is what every browser does currently AFAIK.
3. The behavior of seamless in the face of different overflow values needs
to be spec'ed as well. I think the current spec deals well with
overflow:visible. But if overflow is scroll or auto, then it should behave
the way a div does with overflow scroll or auto (i.e. not size it's height
to its contents).
Ojan
Post by Ojan Vafai
Post by Ian Hickson
* I've added a seamless="" boolean attribute to <iframe>, which, if
the content's active document's URI has the same origin as the
container, causes the iframe to size vertically to the bounding box
of the contents, and horizontally to the width of the container,
and which causes the initial containing block of the contents to be
treated as zero height. In addition, styles on the root element of
the content must inherit from the <iframe> instead of being the
initial values, and the style sheets that apply to the <iframe>
must also apply to the contents. In addition, any time the browsing
context navigates itself, the parent browsing context gets
navigated instead.
This looks awesome.
So, the whole point of these is defining elements that are isolated from
their surrounding context on different axes. Same origin iframes currently
just give you CSS isolation. sandbox affords script isolation. seamless
affords the ability to turn off the CSS isolation.
Seems to me that we need a third property which controls event isolation.
Currently events don't propagate in/out of iframes and event coordinates are
all relative to the iframe's viewport (e.g. on mouse events).
My first intuition was that seamless should also just propagate events and
have mouse coordinate be relative to the parent browsing context. But I can
think of cases where you would want to control the two separately. For
example, if you are especially concerned about performance and don't want
events in the parent browsing context to be handled by the iframe's
contents.
Ojan
Jonas Sicking
2008-05-28 22:23:28 UTC
Permalink
Post by Ian Hickson
There's a big difference to that and to what I'm proposing. With what's
in bug 80713 you're still limited to a box that basically doesn't take
part of the outer page at all. For example in the table example in my
original post the headers of the table would not resize to fit the
column sizes in the <include>ed table.
Woah. That's far more radical. I have no idea how to do that. How would
you make the parser not generate the implied elements and switch straight
to the "in table" mode? How would you make the CSS model work with this?
How would you define conformance for the document fragments?
The parser questions here are interesting for sure, but I believe they
could be solved.

One way to solve the "don't make the parser switch into mode X when it
hits the iframe" would be to teach the parser about <include> (or
<iframe seamless>, or <iframe include>, or whatever it'll be called).
That is pretty ugly though.

One way to solve the fragment issue would be to say that the inner
document always has to be a full document, and then use a fragment
identifier to point to the contents of a table.

The CSS model is simpler. XBL deals with exactly the same problem of
combining multiple DOMs into a single flattened tree on which CSS is
applied.

I'm still intending to do some testing with this idea once I get more
time. A lot of the implementation details have to be solved for XBL anyway.

/ Jonas
Andrew Fedoniouk
2008-05-30 02:40:18 UTC
Permalink
Post by Jonas Sicking
Post by Ian Hickson
There's a big difference to that and to what I'm proposing. With
what's in bug 80713 you're still limited to a box that basically
doesn't take part of the outer page at all. For example in the table
example in my original post the headers of the table would not resize
to fit the column sizes in the <include>ed table.
Woah. That's far more radical. I have no idea how to do that. How
would you make the parser not generate the implied elements and switch
straight to the "in table" mode? How would you make the CSS model work
with this? How would you define conformance for the document fragments?
The parser questions here are interesting for sure, but I believe they
could be solved.
One way to solve the "don't make the parser switch into mode X when it
hits the iframe" would be to teach the parser about <include> (or
<iframe seamless>, or <iframe include>, or whatever it'll be called).
That is pretty ugly though.
One way to solve the fragment issue would be to say that the inner
document always has to be a full document, and then use a fragment
identifier to point to the contents of a table.
The CSS model is simpler. XBL deals with exactly the same problem of
combining multiple DOMs into a single flattened tree on which CSS is
applied.
I'm still intending to do some testing with this idea once I get more
time. A lot of the implementation details have to be solved for XBL anyway.
/ Jonas
That is known as "client side include"

<include src="data.partial.htm">
Ooops, "data.partial.htm" is not available
</include>

After loading data.partial.htm node of <include> is getting replaced by
the content of data.partial.htm.

Simple and straightforward.
--
Andrew Fedoniouk.

http://terrainformatica.com
Mike Ter Louw
2008-07-02 15:42:08 UTC
Permalink
Post by Ian Hickson
* I've added a seamless="" boolean attribute to <iframe>, which, if
the content's active document's URI has the same origin as the
container, causes the iframe to size vertically to the bounding box
of the contents, and horizontally to the width of the container,
and which causes the initial containing block of the contents to be
treated as zero height. In addition, styles on the root element of
the content must inherit from the <iframe> instead of being the
initial values, and the style sheets that apply to the <iframe>
must also apply to the contents. In addition, any time the browsing
context navigates itself, the parent browsing context gets
navigated instead.
This is all HIGHLY EXPERIMENTAL. I am looking for feedback on the general
approaches taken.
The approach is good in that it leverages the existing semantics of
<iframe> as a document / browsing context boundary, and enables tweaking
of content capabilities using a small number of options. The options
are general enough that their effects should not be difficult for web
authors to understand and put to use.

In the spec, would an example of seamless embedding of untrusted content
be useful?

"
In this example, the site www.example.com seamlessly embeds untrusted,
user-generated content such that the untrusted content can not
instantiate plugins, execute scripts, submit forms, etc.. Note for
seamless attribute to be used in conjunction with sandbox attribute,
allow-same-origin keyword must be set and browsing context's active
document must have the same origin as the nested browsing context.


<iframe src="www.example.com/getBlogComments.cgi?article=123"
seamless sandbox="allow-same-origin"></iframe>
"
Post by Ian Hickson
There are various things that this doesn't address yet; e.g. there's no
way to force (or even allow) a non-seamless iframe to open links in the
parent window.
There also does not seem to be a way for embedding untrusted content in
a unique browsing context (i.e., different origin) that allows scripting
and is seamless with the surrounding document.

The is a constraint placed on use of the seamless attribute: its effects
are not applied when the <iframe> is of a different origin. Is this to
protect web authors from unintentional data leaks, or to prevent
self-contradiction of the standard in some way?

To me, the exciting thing about content restrictions is the ability for
a web author, or automated security tools, to define a policy for the
user agent to enforce. Toward this end the standards should allow great
flexibility in how content restriction features can be used. Maybe we
can allow the choice to trade "total security lock-down" for "very good
security, some documented implicit data flows, but greater usability"?

Here's another perspective: Is HTML 5 going to provide sufficient
flexibility to enable web authors to safely embed untrusted content, or
will future generations of web apps continue to rely on content
filtering/sanitization techniques for restricting capabilities of
untrusted content?

Just to be clear, I think the current proposal is a great improvement
over current browser support for content restrictions and will enable a
wave of desperately needed security enhancements on the web.
Post by Ian Hickson
This isn't very readable, I'll grant you. I'm thinking of introducing a
new attribute. I haven't worked out what to call it yet, but definitely
not "src", "source", "src2", "content", "value", or "data" -- maybe
"html" or "doc", though neither of those are great. This attribute would
take a string which would then be interpreted as the source document
markup of an HTML document, much like the above; it would override src=""
This new attribute, along with some form of content encoding (e.g., data
URI scheme), could be very important to the usefulness of the seamless
and sandbox attributes in some applications. Is the hold up just
indecision about naming? How about "text" or "document"?

Mike
Kristof Zelechovski
2008-07-03 07:59:32 UTC
Permalink
For the record:
Microsoft HTML engine supports the following syntax:
IFRAME src="about:<HTML >.</HTML >".
Chris

-----Original Message-----
From: whatwg-bounces-***@public.gmane.org
[mailto:whatwg-bounces-***@public.gmane.org] On Behalf Of Mike Ter Louw
Sent: Wednesday, July 02, 2008 5:42 PM
To: Ian Hickson
Cc: public-webapi-***@public.gmane.org; whatwg; HTMLWG
Subject: Re: [whatwg] The <iframe> element and sandboxing ideas
Post by Ian Hickson
This isn't very readable, I'll grant you. I'm thinking of introducing a
new attribute. I haven't worked out what to call it yet, but definitely
not "src", "source", "src2", "content", "value", or "data" -- maybe
"html" or "doc", though neither of those are great. This attribute would
take a string which would then be interpreted as the source document
markup of an HTML document, much like the above; it would override src=""
This new attribute, along with some form of content encoding (e.g., data
URI scheme), could be very important to the usefulness of the seamless
and sandbox attributes in some applications. Is the hold up just
indecision about naming? How about "text" or "document"?

Mike
Collin Jackson
2008-07-03 17:28:31 UTC
Permalink
On Thu, Jul 3, 2008 at 12:59 AM, Kristof Zelechovski
Post by Kristof Zelechovski
IFRAME src="about:<HTML >.</HTML >".
I'd like to learn more about this. I wasn't able to reproduce it in
IE. Is it documented somewhere?

Collin Jackson
Kristof Zelechovski
2008-07-04 06:41:56 UTC
Permalink
I deliberately used the term "HTML engine" because the Internet Explorer
team decided to disable this feature. However, it is still there if you use
it with another front end, e.g. as a HTML application.
The semantics of the about scheme is documented in the KB (not on MSDN,
except for my note).
HTH
Chris

-----Original Message-----
From: whatwg-bounces-***@public.gmane.org
[mailto:whatwg-bounces-***@public.gmane.org] On Behalf Of Collin Jackson
Sent: Thursday, July 03, 2008 7:29 PM
To: Kristof Zelechovski
Cc: public-webapi-***@public.gmane.org; whatwg; Ian Hickson; Mike Ter Louw; HTMLWG
Subject: Re: [whatwg] The <iframe> element and sandboxing ideas

On Thu, Jul 3, 2008 at 12:59 AM, Kristof Zelechovski
Post by Kristof Zelechovski
IFRAME src="about:<HTML >.</HTML >".
I'd like to learn more about this. I wasn't able to reproduce it in
IE. Is it documented somewhere?

Collin Jackson

Loading...