885b92b88f | ||
---|---|---|
README.md | ||
log.sql | ||
urldecode.sh | ||
westdclog.sh |
README.md
系统安装
数据库扩展
在系统使用中,需要向科学数据中心的数据库中插入对应的数据访问表。
CREATE TABLE accesslog1
(
id serial NOT NULL,
method character varying(10),
path character varying,
protocol character varying,
referer character varying,
size integer,
status integer,
datetime timestamp without time zone,
useragent character varying,
ip inet,
uuid uuid,
CONSTRAINT accesslog1_pkey PRIMARY KEY (id)
) WITH ( OIDS=FALSE);
CREATE INDEX accesslog1_path_idx
ON accesslog1
USING btree
(path COLLATE pg_catalog."zh_CN.utf8");
CREATE INDEX accesslog1_referer_idx
ON accesslog1
USING btree
(referer COLLATE pg_catalog."zh_CN.utf8");
CREATE TABLE tmplog
(
log text,
id serial NOT NULL,
CONSTRAINT tmplog_pkey PRIMARY KEY (id)
) WITH (OIDS=FALSE);
其中,tmplog表用来临时存放访问日志数据,处理完成后放到accesslog1表中。
工具扩展
在服务器系统中需要安装mergelog和cronolog工具,若系统尚未安装,可以通过以下方法安装(需要有root权限)。
sudo apt-get update
sudo apt-get install mergelog cronolog
系统部署和运行
将系统代码(两个sh脚本和log.sql文件)拷贝到服务器某个目录下(假定为/root/westdclog)。假定服务器的访问日志存放于/var/log/apache2目录下。调整Apache2网站的访问日志为逐日生成,这样就能保存所有的访问日志。
CustomLog "|/usr/bin/cronolog /var/log/apache2/myweb-access-%Y%m%d.log" combined
然后修改westdclog.sh中的路径设置。修改完成后,可以通过crontab工具执行自动定时处理。
sudo crontab -e
将系统执行时间放到0点以后(为0时07分)。
7 0 * * * /root/westdclog/westdclog.sh
科学数据共享访问日志数据就这样每天定时运行,运行时处理前一天的网络访问日志数据。
系统使用
数据定时入库后,就可以通过数据库进行查询和处理每日的访问信息。我们给出一些SQL调用样例。限定每IP每日访问一条数据为一次正式访问量。
逐数据访问统计
查询所有数据的访问情况:
select a.uuid,visit_count,b.ts_created from
(
select m.uuid,count(*) as visit_count from heihemetadata m
left join mdstatus s on m.uuid=s.uuid
left join
(select distinct uuid,ip,datetime::date as visit_date from accesslog1) a on m.uuid=a.uuid
where s.status>4
group by m.uuid
) a
left join
(select m.uuid,m.ts_created from metadata m) b
on a.uuid=b.uuid
order by b.ts_created
逐月访问统计
查询已发布数据的逐月访问情况:
select extract(year from a.visit_date)::text||'-'||extract(month from a.visit_date)::text as ym,count(*) as visit_count from metadata m
left join mdstatus s on m.uuid=s.uuid
left join
(select distinct uuid,ip,datetime::date as visit_date from accesslog1) a on m.uuid=a.uuid
where s.status>4
group by extract(month from a.visit_date),extract(year from a.visit_date)
order by ym
访问量最多的前5条数据集
统计访问量最多的前5条数据:
select m.uuid,m.title,count(*) as visit_count from metadata m
left join mdstatus s on m.uuid=s.uuid
left join
(select distinct uuid,ip,datetime::date as visit_date from accesslog1) a on m.uuid=a.uuid
where s.status>4
group by m.uuid,m.title
order by visit_count desc
limit 5
不同时段的累计访问量
统计不同时段的累计访问量,包括24小时、1周、12个月。 24小时的每个小时累计访问量(每小时每IP每数据统计为一次有效访问):
select hour,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(hour from datetime) as hour from accesslog1) as a
group by hour
order by hour
周内每天的累计访问量(每天每IP每数据统计为一次有效访问):
select dow,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(dow from datetime) as dow from accesslog1) as a
group by dow
order by dow
月内每天的累计访问量(每天每IP每数据统计为一次有效访问):
select day,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(day from datetime) as day from accesslog1) as a
group by day
order by day
每个月的累计访问量(每天每IP每数据统计为一次有效访问):
select month,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(month from datetime) as month from accesslog1) as a
group by month
order by month