前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >服务器架设笔记——使用Apache插件解析简单请求

服务器架设笔记——使用Apache插件解析简单请求

作者头像
方亮
发布2019-01-16 14:27:12
8770
发布2019-01-16 14:27:12
举报
文章被收录于专栏:方亮方亮

        一般来说,对于一个请求,服务器都会对其进行解析,以确定请求的合法性以及行进的路径。于是本节将讲解如何获取请求的数据。(转载请指明出于breaksoftware的csdn博客)

        我们使用《服务器架设笔记——编译Apache及其插件》一文中的方法创建一个Handler工程——get_request。该工程中,我们可以操作的入口函数是

代码语言:javascript
复制
static int get_request_handler(request_rec *r)
{
    r->content_type = "text/html";  

        通过该入口函数,我们可以直接得到的数据就是request_rec结构体对象指针r。通过查阅源码,我们得到其定义

代码语言:javascript
复制
/**
 * @brief A structure that represents the current request
 */
struct request_rec {
    /** The pool associated with the request */
    apr_pool_t *pool;
    /** The connection to the client */
    conn_rec *connection;
    /** The virtual host for this request */
    server_rec *server;

    /** Pointer to the redirected request if this is an external redirect */
    request_rec *next;
    /** Pointer to the previous request if this is an internal redirect */
    request_rec *prev;

    /** Pointer to the main request if this is a sub-request
     * (see http_request.h) */
    request_rec *main;

    /* Info about the request itself... we begin with stuff that only
     * protocol.c should ever touch...
     */
    /** First line of request */
    char *the_request;
    /** HTTP/0.9, "simple" request (e.g. GET /foo\n w/no headers) */
    int assbackwards;
    /** A proxy request (calculated during post_read_request/translate_name)
     *  possible values PROXYREQ_NONE, PROXYREQ_PROXY, PROXYREQ_REVERSE,
     *                  PROXYREQ_RESPONSE
     */
    int proxyreq;
    /** HEAD request, as opposed to GET */
    int header_only;
    /** Protocol version number of protocol; 1.1 = 1001 */
    int proto_num;
    /** Protocol string, as given to us, or HTTP/0.9 */
    char *protocol;
    /** Host, as set by full URI or Host: */
    const char *hostname;

    /** Time when the request started */
    apr_time_t request_time;

    /** Status line, if set by script */
    const char *status_line;
    /** Status line */
    int status;

    /* Request method, two ways; also, protocol, etc..  Outside of protocol.c,
     * look, but don't touch.
     */

    /** M_GET, M_POST, etc. */
    int method_number;
    /** Request method (eg. GET, HEAD, POST, etc.) */
    const char *method;

    /**
     *  'allowed' is a bitvector of the allowed methods.
     *
     *  A handler must ensure that the request method is one that
     *  it is capable of handling.  Generally modules should DECLINE
     *  any request methods they do not handle.  Prior to aborting the
     *  handler like this the handler should set r->allowed to the list
     *  of methods that it is willing to handle.  This bitvector is used
     *  to construct the "Allow:" header required for OPTIONS requests,
     *  and HTTP_METHOD_NOT_ALLOWED and HTTP_NOT_IMPLEMENTED status codes.
     *
     *  Since the default_handler deals with OPTIONS, all modules can
     *  usually decline to deal with OPTIONS.  TRACE is always allowed,
     *  modules don't need to set it explicitly.
     *
     *  Since the default_handler will always handle a GET, a
     *  module which does *not* implement GET should probably return
     *  HTTP_METHOD_NOT_ALLOWED.  Unfortunately this means that a Script GET
     *  handler can't be installed by mod_actions.
     */
    apr_int64_t allowed;
    /** Array of extension methods */
    apr_array_header_t *allowed_xmethods;
    /** List of allowed methods */
    ap_method_list_t *allowed_methods;

    /** byte count in stream is for body */
    apr_off_t sent_bodyct;
    /** body byte count, for easy access */
    apr_off_t bytes_sent;
    /** Last modified time of the requested resource */
    apr_time_t mtime;

    /* HTTP/1.1 connection-level features */

    /** The Range: header */
    const char *range;
    /** The "real" content length */
    apr_off_t clength;
    /** sending chunked transfer-coding */
    int chunked;

    /** Method for reading the request body
     * (eg. REQUEST_CHUNKED_ERROR, REQUEST_NO_BODY,
     *  REQUEST_CHUNKED_DECHUNK, etc...) */
    int read_body;
    /** reading chunked transfer-coding */
    int read_chunked;
    /** is client waiting for a 100 response? */
    unsigned expecting_100;
    /** The optional kept body of the request. */
    apr_bucket_brigade *kept_body;
    /** For ap_body_to_table(): parsed body */
    /* XXX: ap_body_to_table has been removed. Remove body_table too or
     * XXX: keep it to reintroduce ap_body_to_table without major bump? */
    apr_table_t *body_table;
    /** Remaining bytes left to read from the request body */
    apr_off_t remaining;
    /** Number of bytes that have been read  from the request body */
    apr_off_t read_length;

    /* MIME header environments, in and out.  Also, an array containing
     * environment variables to be passed to subprocesses, so people can
     * write modules to add to that environment.
     *
     * The difference between headers_out and err_headers_out is that the
     * latter are printed even on error, and persist across internal redirects
     * (so the headers printed for ErrorDocument handlers will have them).
     *
     * The 'notes' apr_table_t is for notes from one module to another, with no
     * other set purpose in mind...
     */

    /** MIME header environment from the request */
    apr_table_t *headers_in;
    /** MIME header environment for the response */
    apr_table_t *headers_out;
    /** MIME header environment for the response, printed even on errors and
     * persist across internal redirects */
    apr_table_t *err_headers_out;
    /** Array of environment variables to be used for sub processes */
    apr_table_t *subprocess_env;
    /** Notes from one module to another */
    apr_table_t *notes;

    /* content_type, handler, content_encoding, and all content_languages
     * MUST be lowercased strings.  They may be pointers to static strings;
     * they should not be modified in place.
     */
    /** The content-type for the current request */
    const char *content_type;   /* Break these out --- we dispatch on 'em */
    /** The handler string that we use to call a handler function */
    const char *handler;        /* What we *really* dispatch on */

    /** How to encode the data */
    const char *content_encoding;
    /** Array of strings representing the content languages */
    apr_array_header_t *content_languages;

    /** variant list validator (if negotiated) */
    char *vlist_validator;

    /** If an authentication check was made, this gets set to the user name. */
    char *user;
    /** If an authentication check was made, this gets set to the auth type. */
    char *ap_auth_type;

    /* What object is being requested (either directly, or via include
     * or content-negotiation mapping).
     */

    /** The URI without any parsing performed */
    char *unparsed_uri;
    /** The path portion of the URI, or "/" if no path provided */
    char *uri;
    /** The filename on disk corresponding to this response */
    char *filename;
    /* XXX: What does this mean? Please define "canonicalize" -aaron */
    /** The true filename, we canonicalize r->filename if these don't match */
    char *canonical_filename;
    /** The PATH_INFO extracted from this request */
    char *path_info;
    /** The QUERY_ARGS extracted from this request */
    char *args;

    /**
     * Flag for the handler to accept or reject path_info on
     * the current request.  All modules should respect the
     * AP_REQ_ACCEPT_PATH_INFO and AP_REQ_REJECT_PATH_INFO
     * values, while AP_REQ_DEFAULT_PATH_INFO indicates they
     * may follow existing conventions.  This is set to the
     * user's preference upon HOOK_VERY_FIRST of the fixups.
     */
    int used_path_info;

    /** A flag to determine if the eos bucket has been sent yet */
    int eos_sent;

    /* Various other config info which may change with .htaccess files
     * These are config vectors, with one void* pointer for each module
     * (the thing pointed to being the module's business).
     */

    /** Options set in config files, etc. */
    struct ap_conf_vector_t *per_dir_config;
    /** Notes on *this* request */
    struct ap_conf_vector_t *request_config;

    /** Optional request log level configuration. Will usually point
     *  to a server or per_dir config, i.e. must be copied before
     *  modifying */
    const struct ap_logconf *log;

    /** Id to identify request in access and error log. Set when the first
     *  error log entry for this request is generated.
     */
    const char *log_id;

    /**
     * A linked list of the .htaccess configuration directives
     * accessed by this request.
     * N.B. always add to the head of the list, _never_ to the end.
     * that way, a sub request's list can (temporarily) point to a parent's list
     */
    const struct htaccess_result *htaccess;

    /** A list of output filters to be used for this request */
    struct ap_filter_t *output_filters;
    /** A list of input filters to be used for this request */
    struct ap_filter_t *input_filters;

    /** A list of protocol level output filters to be used for this
     *  request */
    struct ap_filter_t *proto_output_filters;
    /** A list of protocol level input filters to be used for this
     *  request */
    struct ap_filter_t *proto_input_filters;

    /** This response can not be cached */
    int no_cache;
    /** There is no local copy of this response */
    int no_local_copy;

    /** Mutex protect callbacks registered with ap_mpm_register_timed_callback
     * from being run before the original handler finishes running
     */
    apr_thread_mutex_t *invoke_mtx;

    /** A struct containing the components of URI */
    apr_uri_t parsed_uri;
    /**  finfo.protection (st_mode) set to zero if no such file */
    apr_finfo_t finfo;

    /** remote address information from conn_rec, can be overridden if
     * necessary by a module.
     * This is the address that originated the request.
     */
    apr_sockaddr_t *useragent_addr;
    char *useragent_ip;

    /** MIME trailer environment from the request */
    apr_table_t *trailers_in;
    /** MIME trailer environment from the response */
    apr_table_t *trailers_out;
};

        这是个非常大的结构体,可谓是包罗万象。对于初学者来说,想完全弄明白各项是什么还是比较困难的。而我们的需求很简单,我们就列出我们可能需要关心的数据

代码语言:javascript
复制
    /** First line of request */
    char *the_request;

        请求的第一行数据

代码语言:javascript
复制
    /** Protocol version number of protocol; 1.1 = 1001 */
    int proto_num;
    /** Protocol string, as given to us, or HTTP/0.9 */
    char *protocol;
    /** Host, as set by full URI or Host: */
    const char *hostname;

        协议的版本和请求的类型

代码语言:javascript
复制
    /** Time when the request started */
    apr_time_t request_time;

        请求的时间

代码语言:javascript
复制
    /** The URI without any parsing performed */
    char *unparsed_uri;
    /** The path portion of the URI, or "/" if no path provided */
    char *uri;
    /** The filename on disk corresponding to this response */
    char *filename;

        未进行urldecode的URI、经过urldecode的URI和处理该请求的文件路径

代码语言:javascript
复制
    /** The PATH_INFO extracted from this request */
    char *path_info;
    /** The QUERY_ARGS extracted from this request */
    char *args;

         请求中的路径和参数

代码语言:javascript
复制
    /** A struct containing the components of URI */
    apr_uri_t parsed_uri;

        请求解析的详细结果

代码语言:javascript
复制
    char *useragent_ip;

        请求来源的IP

代码语言:javascript
复制
/** MIME header environment from the request */
    apr_table_t *headers_in;

        以table形式保存的http头信息

        对于基础数据类型我们很容易编写出例程

代码语言:javascript
复制
	if (r->the_request) {
		ap_rprintf(r, "the request : %s\n", r->the_request);
	}
	else {
		ap_rprintf(r, "the request is NULL\n");
	}

	if (r->protocol) {
		ap_rprintf(r, "protocol : %s\n", r->protocol);
	}
	else {
		ap_rprintf(r, "protocol is NULL\n");
	}

	ap_rprintf(r, "proto_num is %d\n", r->proto_num);

        而对于请求时间apr_time_t类型,我们可以参考《服务器架设笔记——Apache模块开发基础知识》中对模块的介绍。我们查看源码,可以编写出如下例程

代码语言:javascript
复制
static void print_time(request_rec* r) {
	if (!r) {
		ap_rprintf(r, "request_rec pointer is NULL\n");
		return;
	}
	char data_str[128] = {0};
	apr_status_t status = apr_ctime(data_str, r->request_time);
	if (APR_SUCCESS != status) {
		ap_rprintf(r, "apr_ctime error\n");	
	}
	else {
		ap_rprintf(r, "ctime\t:\t%s\n", data_str);
	}

	apr_time_exp_t exp_t;
	memset(&exp_t, 0, sizeof(exp_t));
	status = apr_time_exp_gmt(&exp_t, r->request_time);
	if (APR_SUCCESS != status) {
		ap_rprintf(r, "apr_time_exp_gmt error\n");
	}
	else {
		ap_rprintf(r, "exp time\t:\n");
		ap_rprintf(r, "\ttm_usec\t:\t%d\n", exp_t.tm_usec);
		ap_rprintf(r, "\ttm_sec\t:\t%d\n", exp_t.tm_sec);
		ap_rprintf(r, "\ttm_min\t:\t%d\n", exp_t.tm_min);
		ap_rprintf(r, "\ttm_hour\t:\t%d\n", exp_t.tm_hour);
		ap_rprintf(r, "\ttm_mday\t:\t%d\n", exp_t.tm_mday);
		ap_rprintf(r, "\ttm_mon\t:\t%d\n", exp_t.tm_mon);
		ap_rprintf(r, "\ttm_year\t:\t%d\n", exp_t.tm_year);
		ap_rprintf(r, "\ttm_wday\t:\t%d\n", exp_t.tm_wday);
		ap_rprintf(r, "\ttm_yday\t:\t%d\n", exp_t.tm_yday);
		ap_rprintf(r, "\ttm_isdst\t:\t%d\n", exp_t.tm_isdst);
		ap_rprintf(r, "\ttm_gmtoff\t:\t%d\n", exp_t.tm_gmtoff);
	}
}

        其中apr_time_exp_t的定义在《apr_time.h》中。

代码语言:javascript
复制
/**
 * a structure similar to ANSI struct tm with the following differences:
 *  - tm_usec isn't an ANSI field
 *  - tm_gmtoff isn't an ANSI field (it's a BSDism)
 */
struct apr_time_exp_t {
    /** microseconds past tm_sec */
    apr_int32_t tm_usec;
    /** (0-61) seconds past tm_min */
    apr_int32_t tm_sec;
    /** (0-59) minutes past tm_hour */
    apr_int32_t tm_min;
    /** (0-23) hours past midnight */
    apr_int32_t tm_hour;
    /** (1-31) day of the month */
    apr_int32_t tm_mday;
    /** (0-11) month of the year */
    apr_int32_t tm_mon;
    /** year since 1900 */
    apr_int32_t tm_year;
    /** (0-6) days since Sunday */
    apr_int32_t tm_wday;
    /** (0-365) days since January 1 */
    apr_int32_t tm_yday;
    /** daylight saving time */
    apr_int32_t tm_isdst;
    /** seconds east of UTC */
    apr_int32_t tm_gmtoff;
};

        对于已分析过了的请求结构体apr_uri_t的例程也非常简单,我就不再列出来,只是把其结构体定义贴一下。大家一看就明白

代码语言:javascript
复制
/**
 * A structure to encompass all of the fields in a uri
 */
struct apr_uri_t {
    /** scheme ("http"/"ftp"/...) */
    char *scheme;
    /** combined [user[:password]\@]host[:port] */
    char *hostinfo;
    /** user name, as in http://user:passwd\@host:port/ */
    char *user;
    /** password, as in http://user:passwd\@host:port/ */
    char *password;
    /** hostname from URI (or from Host: header) */
    char *hostname;
    /** port string (integer representation is in "port") */
    char *port_str;
    /** the request path (or NULL if only scheme://host was given) */
    char *path;
    /** Everything after a '?' in the path, if present */
    char *query;
    /** Trailing "#fragment" string, if present */
    char *fragment;

    /** structure returned from gethostbyname() */
    struct hostent *hostent;

    /** The port number, numeric, valid only if port_str != NULL */
    apr_port_t port;
    
    /** has the structure been initialized */
    unsigned is_initialized:1;

    /** has the DNS been looked up yet */
    unsigned dns_looked_up:1;
    /** has the dns been resolved yet */
    unsigned dns_resolved:1;
};

        这些例程中麻烦的是对apr_table_t的解析。因为网上很难找到对该table的遍历代码,于是我只能参考apr_table_clone中代码得出如下

代码语言:javascript
复制
static void print_table(request_rec *r, const apr_table_t* t) {
	const apr_array_header_t* array = apr_table_elts(t);
	apr_table_entry_t* elts = (apr_table_entry_t*)array->elts;
	for (int i = 0; i < array->nelts; i++) {
		ap_rprintf(r, "\t%s : %s\n", elts[i].key, elts[i].val);
	}
}

        我们请求一个URL:http://192.168.191.129/AP%26AC%3aHE?a=b#c

        其返回如下

代码语言:javascript
复制
headers_in start
	Host : 192.168.191.129
	Connection : keep-alive
	Cache-Control : max-age=0
	Accept : text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
	User-Agent : Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36
	Accept-Encoding : gzip,deflate,sdch
	Accept-Language : zh-CN,zh;q=0.8
headers_in end

headers_out start
headers_out end

the request : GET /AP%26AC%3aHE?a=b HTTP/1.1
protocol : HTTP/1.1
proto_num is 1001
method : GET
host name : 192.168.191.129
unparsed uri : /AP%26AC%3aHE?a=b
uri : /AP&AC:HE
filename : /usr/local/apache2/htdocs/AP&AC:HE
path info : 
args : a=b
user is NULL
log id is NULL
useragent ip : 192.168.191.1
ctime	:	Mon Feb 16 18:20:39 2015
exp time	:
	tm_usec	:	200039
	tm_sec	:	39
	tm_min	:	20
	tm_hour	:	10
	tm_mday	:	16
	tm_mon	:	1
	tm_year	:	115
	tm_wday	:	1
	tm_yday	:	46
	tm_isdst	:	0
	tm_gmtoff	:	0
scheme is NULL
hostinfo is NULL
user is NULL
password is NULL
hostname is NULL
port_str is NULL
path : /AP&AC:HE
query : a=b
fragment is NULL
The sample page from mod_get_request.c
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2015年02月28日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档