Semantic Web Using HTML5

Posted on - Last Modified on

What are Semantic Computing and Semantic Web?

Semantic Computing is a mix of different computer science fields, such as language processing, data mining, and semantic analysis. The purpose of semantic computing is to use computers in a way that is more natural and intuitive. Such as when we expect a computer to be able to tell us the differences between certain things by asking questions or resolving things simply by asking the computer to do it for us. Semantic Computing and the concept of Semantic Web are closely related. When I say Semantic Web, I am referring to the Internet as Tim Berners-Lee, the director of W3C described it:

"I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize."

How does HTML5 come in the picture?

When the drafts of HTML5 were created, the authors thought to add support for a smarter Web and they extended the HTML4 standard with so-called semantic elements. The way we develop Web applications has hugely changed. You most probably remember the days when website designs were created using HTML <table> elements (check out this article why it’s NOT a good practice to use table based designs). HTML block elements like <table> and <div> make Web page semantic parsing difficult, because these containers do not offer any guidance about their content. In many cases developers tried to create meaningful CSS class names for div elements to guide indexers, and to compensate for the lack of real semantic HTML tags.

Thankfully, there were a lot of new tags added to the standard on HTML5. First and one of the most important ones are the extended <meta> tags. The <meta> tags should appear inside the <head> element and these contain information about the content of a Web page. The most common meta tags are:

<meta charset=”UTF-8”>
<meta name=”author” content=”John Doe”>
<meta name=”description” content=”John Doe’s personal website.”>
<meta name=”keywords” content=”freelancing, web design, photoshop, css, html5, design”>

The charset meta tag notifies the browser what character set it should use to display the text on the Web page. The author, description and keyword meta tags are self-explanatory. These help semantic and search engines index your Web page better and give more context to it.

Navigation was another part in HTML5 that was changed. A new <nav> tag was introduced to guide the content parsers and suggest that text or images in this area are used for navigating between content on the Web page. When semantic Web scrapers parse your website, these can more easily build up relationships between your Web pages and the actual content.

Because a lot of websites publish new content, articles, blog posts, news, user comments, I used the following examples to present the semantic tags in HTML5 for content management:

    <h1>My Article's Header</h1>
    <p>Published on: <time pubdate datetime="2015-02-19">19th February, 2015</time></p>
  <p>Once upon a time there were...</p>
    <h2>User Feedback (Comments)</h2>
		<h3>Posted by: John Doe</h3>
		<p>at <time pubdate datetime="2015-02-20 21:33">21:33, 20th February, 2015 </time></p>
		<p>I think the story, once upon a time has a really good and meaningful...</p>

The most important tag is the <article> tag – this guides search engines and semantic Web scrapers to categorize the content inside the tag as an article. Articles usually have a <header> element that contains the title and publication date. The <time> element stores date and time information about the article, and because it is inside the <article> and <header> tags, it is clear for parsers that this date is related to the article. The articles’ contents are usually made up of paragraphs and images, but in other cases, articles can have many sections or video and audio included. In many cases user comments are added as <article> tags too.

The bandwidth for accessing the World Wide Web with has grown tremendously, and users have started to consume other content types such as high resolution pictures, music and video streams. The <video> and <audio> tags were introduced for browsers that offer support for streaming and playing media content on the website.

<audio src="mysongsmix.mp3" preload="auto" controls></audio>
<video controls>
  <source src="mysong.ogg" type="video/ogg">
  <source src="mysong.mp4" type="video/mp4">
  Your browser does not support the <code>video</code> element. Please upgrade or move to a new browser in order to play the video. Thank you.

The <audio> and <video> tags have the controls attribute. This notifies the browser to display the controls for the audio and video players. In case of <video> you can specify multiple sources, since it's possible that the user's operating system does not have the needed plugin installed for playing a certain type of video file. The video player will fall back and load the file from the server that the user has a plugin for.

By using these elements on your Web page, you make sure that search engines, semantic Web scrapers, and indexers have an easier job when parsing your website. These will help make sure your content is correctly categorized and indexed on the Internet.

Posted 21 April, 2015

Greg Bogdan

Software Engineer, Blogger, Tech Enthusiast

I am a Software Engineer with over 7 years of experience in different domains(ERP, Financial Products and Alerting Systems). My main expertise is .NET, Java, Python and JavaScript. I like technical writing and have good experience in creating tutorials and how to technical articles. I am passionate about technology and I love what I do and I always intend to 100% fulfill the project which I am ...

Next Article

Stress Management Tips for Workaholics