Here I am again with some new thoughts about developing web browsers...
Here is another screenshot of my browser. As you can see it renders a simple page okayish already.
The scrollbar is missing. No idea why it isn't visible.
Page rendering
In it's current state, the browser renders the pages using simple windows forms controls. Elements like
div,b,p and such are
Panels. Text inside a
b node is a
Label inside a Panel.
This method has one major flaw: text formatting. Take a page which has a really really long text node. How would that be rendered on multiple lines? I could make the label higher... but...
If the page had something like..
<i>Hello</i>, here is a really long sentence which needs to be on multiple lines
..then we'd be in trouble. This is because we'd first have a small Panel which would containt the text Hello and then
another Panel which would contain the rest of the text.
Imagine stretching the second box higher there... this is what happens:
So it would be just completely stupid looking. That is caused because the second panel cannot magically extend itself to begin from the left edge on the second row but not on the first.
So...
GDI+ to the rescue
To overcome the problem, I will have to trash the current model of rendering pages which has the parts of the page as separate controls. I'd have to do a
single über control which renders the whole page using some kind of magic!
That magic I'll call GDI+, which are the tools I can use from .NET Framework to render custom controls or whatever I'd want. (or something like that, google for GDI+)
So... what I'll do is... begin creating a custom super duper über control, which is made of win and good. It's gonna be one hell of a job, but I don't care. As long as I have fun... ;)
For starters, I have to make it render text. Okay. Then, if I'd want to make the text selectable, I'd have to make a custom routine which would highlight the text... and check where the mouse cursor actually was to determine what text to highlight.
And it would also need to understand clicks on certain parts of the text, like links.
After that... images... borders... whatever comes to my mind... craziness ensues.
Oh well. It's gonna be lots of work and if I'll ever get to the point with my custom control where it will render the page like the browser does now with the windows forms controls... I'll be happy. :]
and regarding html parsing...
I was originally going to use the .NET Framework built-in System.Xml features to parse documents, but considering nobody knows how to write valid html, or even less valid xhtml, I trashed the idea and wrote my own parser.
After all, I do want the browser to support documents that aren't 100% valid too. My parser seems quite good (for one written by me), it reads the document, gets element values and attributes and seems efficient. However, I don't have any idea at all if it does the parsing in a manner which is considered efficient by others than me too ;)
At the moment, it takes HTML data and just starts going through it, one character at a time using a StringReader. When it encounters a < character, it does into "read tag" mode, reading until a > is encountered (or EOF. EOF is always checked)
After reading the data between < and >, it tries to figure out the element type in question and the attributes of the element (if any). Then it proceeds to see if the element is an element which can contain text or child nodes or not(eg. an img element can not)
If it can contain child nodes, the newly read element is changed as the parent node to which new nodes are appended to, until a closing tag for the element is encountered.
Text data is considered to start after a >.. everything that comes after that and is not a < is considered as textual data for the current parent node and is converted into a text node when a < is encountered and appended to the parent.
That's basically it. Any comments and/or suggestions more than welcome.
Whoah, that was a rather long post. My work is soon over for today and it's time to head home!
ps. a "Read More..." link might be good