跳转至

Lecture 13 Introduction to Web

Basic Items

(1) Web (World Wide Web): A collection of data and services

(2) The web is not the Internet

  • The Internet describes how data is transported between servers and browsers
  • We will study the Internet later in the networking unit

(3) Elements of the Web

  • URLs: uniquely identify a piece of data on the web
  • HTTP: the standard for how web browsers communicate with web servers
  • Data on a webpage can contain:
    • HTML: A markup language for creating webpages
    • CSS: A style sheet language for defining the appearance of webpages
    • JavaScript: A programming language for running code in the web browse

URLs

URL (Uniform Resource Locator): A string that uniquely identifies one piece of data on the web

A URL contains:

  • Scheme
  • Domain
  • Location
  • Path
  • Query
  • Fragment

We will introduce them one by one :)

Scheme

Scheme 方案 / 协议

  1. Located just before the double slashes 在双斜线前面
  2. Defines how to retrieve the data over the Internet (namely, which Internet protocol to use)
  3. Common schemes: http (unencrypted) or https (secure, encrypted)
Text Only
1
https://toon.cs161.org/xorcist/avian.html

Here, https is the scheme, it uses the https protocol to retrieve.

Domain

Domain 域名

  1. Located after the double slashes, but before the next single slash 恰在双斜线后面,但在下一个斜线前面
  2. Defines which web server to contact
  3. Written as several phrases separated by dots 用点号连接
Text Only
1
https://toon.cs161.org/xorcist/avian.html

Here, toon.cs161.org is the domain.

Location

Location 定位资源在哪个地方 (server-level)

Location: The domain with some additional information

There are 2 mode of location:

  1. Username: [email protected]
    • Identifies one specific user on the web server
    • Rarely seen
  2. Port: toon.cs161.org:4000
    • Identifies one specific application on the web server
    • We will see ports again in the networking unit
Text Only
1
https://toon.cs161.org:4000/xorcist/avian.html

Here, toon.cs161.org:4000 is the location.

Path

Path 定位资源在server的哪个路径 (file-level)

  1. Located after the first single slash 在第一个单斜线后面
  2. Defines which file on the web server to fetch
    • Think of the web server as having its own filesystem
    • The path represents a file path on the web server's filesystem
  3. Examples
    • https://toon.cs161.org/xorcist/avian.html: Look in the xorcist folder for avian.html
    • https://toon.cs161.org/: Return the root directory /

Query

Query 传参查询

  1. Providing a query is optional
  2. Located after a question mark 在问号后面
  3. Supplies arguments to the web server for processing
    • Think of the web server as offering a function at a given path
    • To access this function, a user makes a request to the path, with some arguments in the query
    • The web server runs the function with the user's arguments and returns the result to the user
  4. Form: Arguments are supplied as name=value pairs, separated with ampersands (&)
Text Only
1
https://toon.cs161.org/draw?character=evan&size=big

Here, character=evan&size=big is the query, means: I need the draw function, its size equals to big and character equals to evan.

Fragment

Fragment 局部的导航标志: 不与服务器交互,而是告诉浏览器如何在页面内定位特定内容或者向 JavaScript 代码传递参数

  1. Providing a fragment is optional
  2. Located after a hash sign (#) 在井号后面
  3. Not sent to the web server! Only used by the web browser
    • Common usage: Tells the web browser to scroll to a part of a webpage
    • Usage: Supplies content to code in the web browser (JavaScript) without sending the content to the server
URL Escaping

URLs have special characters (?, #, /)

What if we want to use a special character in the URL?

Solution: URL encoding (URL解码)

  • Notation: Percent sign (%) followed by the hexadecimal value of the character
  • Example:
    • %20 = ' ' (spacebar)
    • %35 = '#' (hash sign)
    • %50 = '2' (printable characters can also be encoded)

It will raise some security issues: makes scanning for malicious URLs harder

We will talk about this later

Summary of URL

alt text

HTTP

  1. HTTP (Hypertext Transfer Protocol): A protocol used to request and retrieve data from a web server
  2. HTTPS: A secure version of HTTP
  3. HTTP is a request-response model
    • Web Browser sends a request to a Web Server
    • Web Server processes the request and sends a response back to the Web Browser

Components of HTTP Request

  • URL Path (maybe contains query parameters)
  • Method:
    • GET: "get" info from the server, don't change server-side state
    • POST: "post" info to the server, update server-side state
  • Data:
    • GET Requests do not contain any data
    • POST Requests can contain data

Components of HTTP Response

  • Status Code: indicating what happened with the request
    • 200: OK
    • 403: Access Forbidden
    • 404: Page Not Found
  • Data:
    • can be a web-page / image / PDF ...

Parts of a Webpage

  1. HTML: A markup language to create structured documents
  2. CSS: A style sheet language
  3. JavaScript: running code in web browser
    • client-side: run in browser, not server
    • manipulate HTML and CSS: more interactive
How to render a webpage
  1. Browser receives HTML, CSS and JavaScript from Server
  2. HTML and CSS are parsed into a DOM
  3. JavaScript is interpreted and executed, possibly modifying the DOM
  4. The painter uses the DOM to draw the webpage
DOM

DOM (Document Object Model):

  1. Cross-platform Model for representing and interacting with objects in HTML
  2. Cross-platform and language-independent interface that treats an XML or HTML document as a tree structure
  3. Each node in this tree has a tag / attributes / child nodes

Risks on the Web

  • Risk #1: Web servers should be protected from unauthorized access
    • Protection: Server-side security
  • Risk #2: A malicious website should not be able to damage our computer
    • Protection: Sandboxing
      • JS is not allowed to access files on our computer
      • Review: Privilege Seperation / Least Privilege
  • Risk #3: A malicious website should not be able to tamper with our interactions with other websites
    • Same-Origin Policy: Web Browser prevents a webpage from accessing data other unrelated websites
Sandboxing

Web 开发中,Sandboxing(沙盒化) 是一种安全机制,用于限制运行代码的执行范围,以防止恶意或未经授权的操作。这种机制通常用于保护浏览器或应用免受潜在的安全威胁,确保代码只能在受控的、隔离的环境中运行。

Same-Origin Policy

Same-Origin Policy: A rule that prevents one website from tampering with another website

  1. Trait: Enforced by Web Browser
  2. Principle: Two webpages have the same origin if and only if the protocol, domain, and port of the URL all match exactly

If no port is specified, the default is 80 for HTTP and 443 for HTTPS

alt text