Go to file
wlx 885b92b88f Fix access problem 2016-05-25 17:31:11 +08:00
README.md code import 2015-05-14 11:07:32 +08:00
log.sql Fix access problem, mv tmp file to /tmp 2016-05-20 16:49:21 +08:00
urldecode.sh code import 2015-05-14 11:07:32 +08:00
westdclog.sh Fix access problem 2016-05-25 17:31:11 +08:00

README.md

系统安装

数据库扩展

在系统使用中,需要向科学数据中心的数据库中插入对应的数据访问表。

CREATE TABLE accesslog1
(
  id serial NOT NULL,
  method character varying(10),
  path character varying,
  protocol character varying,
  referer character varying,
  size integer,
  status integer,
  datetime timestamp without time zone,
  useragent character varying,
  ip inet,
  uuid uuid,
  CONSTRAINT accesslog1_pkey PRIMARY KEY (id)
) WITH (  OIDS=FALSE);

CREATE INDEX accesslog1_path_idx
  ON accesslog1
  USING btree
  (path COLLATE pg_catalog."zh_CN.utf8");

CREATE INDEX accesslog1_referer_idx
  ON accesslog1
  USING btree
  (referer COLLATE pg_catalog."zh_CN.utf8");

CREATE TABLE tmplog
(
  log text,
  id serial NOT NULL,
  CONSTRAINT tmplog_pkey PRIMARY KEY (id)
) WITH (OIDS=FALSE);

其中tmplog表用来临时存放访问日志数据处理完成后放到accesslog1表中。

工具扩展

在服务器系统中需要安装mergelog和cronolog工具若系统尚未安装可以通过以下方法安装需要有root权限

sudo apt-get update
sudo apt-get install mergelog cronolog

系统部署和运行

将系统代码两个sh脚本和log.sql文件拷贝到服务器某个目录下假定为/root/westdclog。假定服务器的访问日志存放于/var/log/apache2目录下。调整Apache2网站的访问日志为逐日生成这样就能保存所有的访问日志。

CustomLog "|/usr/bin/cronolog /var/log/apache2/myweb-access-%Y%m%d.log" combined

然后修改westdclog.sh中的路径设置。修改完成后可以通过crontab工具执行自动定时处理。

sudo crontab -e

将系统执行时间放到0点以后为0时07分

7 0 * * * /root/westdclog/westdclog.sh 

科学数据共享访问日志数据就这样每天定时运行,运行时处理前一天的网络访问日志数据。

系统使用

数据定时入库后就可以通过数据库进行查询和处理每日的访问信息。我们给出一些SQL调用样例。限定每IP每日访问一条数据为一次正式访问量。

逐数据访问统计

查询所有数据的访问情况:

select a.uuid,visit_count,b.ts_created from
(
select m.uuid,count(*) as visit_count from heihemetadata m
left join mdstatus s on m.uuid=s.uuid
left join
(select distinct uuid,ip,datetime::date as visit_date from accesslog1) a on m.uuid=a.uuid
where s.status>4
group by m.uuid
) a
left join
(select m.uuid,m.ts_created from metadata m) b
on a.uuid=b.uuid
order by b.ts_created

逐月访问统计

查询已发布数据的逐月访问情况:

select extract(year from a.visit_date)::text||'-'||extract(month from a.visit_date)::text as ym,count(*) as visit_count from metadata m
left join mdstatus s on m.uuid=s.uuid
left join
(select distinct uuid,ip,datetime::date as visit_date from accesslog1) a on m.uuid=a.uuid
where s.status>4
group by extract(month from a.visit_date),extract(year from a.visit_date)
order by ym

访问量最多的前5条数据集

统计访问量最多的前5条数据

select m.uuid,m.title,count(*) as visit_count from metadata m
left join mdstatus s on m.uuid=s.uuid
left join
(select distinct uuid,ip,datetime::date as visit_date from accesslog1) a on m.uuid=a.uuid
where s.status>4
group by m.uuid,m.title
order by visit_count desc
limit 5

不同时段的累计访问量

统计不同时段的累计访问量包括24小时、1周、12个月。 24小时的每个小时累计访问量每小时每IP每数据统计为一次有效访问

select hour,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(hour from datetime) as hour from accesslog1) as a
group by hour
order by hour

周内每天的累计访问量每天每IP每数据统计为一次有效访问

select dow,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(dow from datetime) as dow from accesslog1) as a
group by dow
order by dow

月内每天的累计访问量每天每IP每数据统计为一次有效访问

select day,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(day from datetime) as day from accesslog1) as a
group by day
order by day

每个月的累计访问量每天每IP每数据统计为一次有效访问

select month,count(*) from
(select distinct uuid,ip,datetime::date as visit_date,extract(month from datetime) as month from accesslog1) as a
group by month
order by month